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FOREWORD 


This is the progress report on the research project “Surface Modeling and 
Optimization Studies of Aerodynamic Configurations.” Within the guidelines 
of the project, our attention was directed toward research activities in the 
area of surface reconstruction by optimization techniques and spline smooth- 
ing. The period of performance of this specific research was from January 
1, 1994 to December 31, 1994. Our research was supported by the NASA 
Langley Research Center through Cooperative Agreement NCC1-68. The 
cooperative agreement was monitored by Dr. Robert E. Smith Jr. of System 
and Information Devision (Scientific Application Branch), NASA Langley 
Research Center. 



AUTOMATION OF REVERSE ENGINEERING 
PROCESS IN AIRCRAFT MODELING AND 
RELATED OPTIMIZATION PROBLEMS 

Dr. W. Li and Dr. J. Swetits 
Old Dominion University 

Summary 

During the year of 1994, we studied reverse engineering problems in aircraft 
modeling. Our initial concern was to obtain a surface model with desirable 
geometric characteristics. Much of the effort during the first half of the year 
was to find an efficient way for solving a computationally difficult optimiza- 
tion model. Since the smoothing technique in the proposal “Surface Modeling 
and Optimization Studies of Aerodynamic Configurations” requires solutions 
of a sequence of large-scale quadratic programming problems, it is important 
to design algorithms that can solve each quadratic program in a few itera- 
tions. Our research led to 3 papers by Dr. W. Li, which were submitted 
to SIAM Journal on Optimization and Mathematical Programming. Two of 
these papers have been accepted for publication. 

Even though significant progress has been made during this phase of re- 
search and computation time was reduced from 30 minutes to 2 minutes for 
a sample problem, it was not good enough for on-line processing of digitized 
data points. After discussion with Dr. Robert E. Smith Jr., we decided not to 
enforce shape constraints in order to simplify the model. As a consequence, 
we adopted P. Dierckx’s nonpar ametric spline fitting approach, where one 
has only one control parameter for the fitting process - the error tolerance. 
At the same time we also tested the surface modeling software developed by 
Imagewaxe. Our research indicates a substantially improved fitting of digi- 
tized data points can be achieved if a proper parameterization of the spline 
surface is chosen. A winning strategy is to incorporate Dierckx’s surface 
fitting with a natural parameterization for aircraft parts. 

The report consists of 4 chapters. Chapter 1 provides an overview of re- 
verse engineering related to aircraft modeling and some preliminary findings 
of our effort in the second half of the year. Chapters 2-4 are the research 
results by Dr. W. Li on penalty functions and conjugate gradient methods 
for quadratic programming problems. 
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Chapter 1 

Automation of Reverse 
Engineering Process in 
Aircraft Modeling 


In this chapter we analyze the reverse engineering process in aircraft modeling 
and point our what can make the reverse engineering process more accurate 
and efficient. 


1.1 Introduction 

Reverse engineering is a relatively new terminology emerging from late 80’s 
[3]— [15] . If one researches the literature on reverse engineering, one will find 
references on reverse engineering in software development and maintenance 
as well as reverse engineering related to computer-aided manufacturing. Per- 
haps it is appropriate to call it reverse mechanical engineering when it is 
related to modeling of a mechanical part. Mechanical engineering is a pro- 
cess that starts with a design and ends with a mechanical product. Therefore, 
any process that starts with a mechanical part and ends with a design might 
be called reverse mechanical engineering. Recently, with the development 
of advanced laser digitizer and computer-aided manufacturing, reverse me- 
chanical engineering is not only a research initiative but also an industrial 
reality. Many researchers are looking for ways of building a “gigantic me- 
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chanical copying machine” which automatically produces a replica of any 
given mechanical part (or its scaled model). 

Reverse mechanical engineering has many applications, such as manufac- 
turing of a part from a new physical model, replication of a mechanical part 
that does not have a computer-recognizable design, analysis/modification of 
a mechanical part that does not have a computer-recognizable design, and 
verification of a mechanical part that has a computer-recognizable design. 
For aircraft modeling, reverse engineering can be used for verification of the 
reliability/accuracy of a scaled-down model based on a computer design and 
CFD analysis of the structure of a new aircraft model based on its spline 
surface representation. 

In general, a reverse (mechanical) engineering process consists of 5 pro- 
cedures: (1) acquisition of data points (it is done by using some coordinate 
measuring machine, most likely a laser digitizer), (2) separation of data (it is 
related to pattern recognition in artificial intelligence, such as extracting the 
data points on a wing of an aircraft), or feature identification of the model, 
(3) surface fitting of each part that has a simple geometric structure, (4) 
reassembling of parts by using CAD programs, (5) computer-aided manufac- 
turing (CAM) of the original model based on the mathematical model. 

The important issues related to reverse engineering process in aircraft 
modeling are accuracy and efficiency. With enough resources, one can create 
spline surface patches from digitized data points that represent the original 
model. However, there are a few sources of inaccuracy in the modeling pro- 
cess: (1) imperfection of the original model, (2) error in measurement by a 
laser digitizer, (3) missing data points, and (4) error occurred in reconstruc- 
tion of surface patches based on the digitized data points. The first three 
types of errors are not our primary concern at the moment. As a matter of 
fact, they are relatively insignificant when compared with the human error 
in reconstruction of surface patches. In order to produce an acceptable sur- 
face model, we have to reduce fitting errors in surface reconstruction. With 
advanced digitizing technology, it takes a few seconds to measure millions of 
data points on a surface. How to process a huge amount of data points be- 
comes a critical issue in surface reconstruction. Therefore, in order to make 
the reverse engineering process work, we have to address accuracy and effi- 
ciency issues in reconstruction of spline surface patches from digitized data 
points. 
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1.2 Issues in Surface Reconstruction 

The reconstruction of spline surface patches from digitized data points can 
be roughly described as the following problem: given a set of measured data 
points {(xi, yi, z») : 1 < i < n) on an aircraft, find spline surface patches that 
represent the aircraft “accurately” and “efficiently”. This seemingly naive 
mathematical problem is actually very difficult and complicated (cf. [3]). 
It involves many different fields in mathematics, such as Computer-Aided 
Geometric Design (CAGD), Approximation Theory, Numerical Optimization 
Techniques, Nonparametric Regression in Statistics, and Numerical Linear 
Algebra (for software coding). 

First let us consider the accuracy issue. Ideally, one wants to find a spline 
surface such that all the measured points are on the surface. Then we know 
that, at least at those measured points, the spline surface faithfully repre- 
sents the aircraft model. However, there are several reasons for us not to use 
such a spline surface: (1) due to imperfection of the aircraft model and/or 
measurement error, the measured point positions are not accurate and such 
a spline surface normally has local oscillatory structure which is not realis- 
tic as an aircraft’s surface; (2) computationally, it might be impossible to 
find such a surface and the related mathematical problem is called spline 
interpolation in approximation theory; (3) even if one can find such a spline 
surface, it usually requires a complicated mathematical form and its repre- 
sentation might have tens of thousands of coefficients which is not desirable. 
As a consequence, one has to set an error tolerance for the spline surface 
representation of an aircraft model. A common practice is to use the stan- 
dard deviation as a measure of accuracy of the spline surface representation. 
However, error analysis of fitting data points by a spline surface is a quite 
complicated mathematical problem. Our objective is to improve surface fit- 
ting techniques so that better surface representations can be produced as a 
result. 

Another issue is efficiency. There are two kinds of efficiency in surface 
reconstruction: efficiency in use of resources and efficiency in mathematical 
surface representation. If one has to spend days and months in order to 
produce a desirable surface by using a software package, then the software 
is not efficient for the reconstruction of a surface. If the mathematical sur- 
face needs tens of thousands of parameters for its representation, then the 
representation is not efficient. Here our objective is to design programs that 
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can quickly produce a spline surface with only a few hundreds of parameters, 
even though the digitized data set might contain a quarter of a million to a 
million of data points. 

However, there is a conflict in achieving our objectives. In general, a 
more accurate representation of the data points yields a more complicated 
spline surface which requires more parameters for its mathematical form. 
Sometimes, we can not improve the accuracy of a surface representation 
while using a simple mathematical form, and there has to be a trade-off 
between the accuracy of a surface approximation and the simplicity of surface 
representation. One might use some strategy in image compression, where 
the rate of compression conflicts with the quality of the compressed image. In 
wavelets compression of digitized images, Saito [1] used minimal information 
description length to determine the optimal trade-off between accuracy and 
simplicity. Similar ideas might be used here. 


1.3 Description of Research Results 

How can we design a program that produces a simple and accurate spline 
surface representation of millions of digitized data points? In this section, 
we describe one possible way to accomplish this task. 

The reconstruction of a surface from digitized data points consists of 
the following processes: (1) separation of a model into geometrically simple 
paxts, such as sepaxation of an aircraft into wings, tails, and fuselage, etc. 
(this process is called data sepaxation in pattern recognition engineering); (2) 
representation of each part as a spline surface (surface fitting); (3) reassem- 
bling of the spline patches as one surface that represents the original model. 
So far, our effort is on the surface fitting part of the modeling process. The 
reason is that inaccurate surface fitting results in erroneous spline surface 
representation of an aircraft model. 

During the summer of 1994 when our research was funded by NASA 
through the grant NCC-1-68 Supplement-15 from NASA Langley Research 
Center, we studied vaxious techniques for surface fitting. In particular, we 
studied a shape-control regression model by forcing constraints on deriva- 
tives, the B-spline curve/surface modeling software “SURFACER” developed 
by Imageware, and the curve/surface fitting package in NETLIB library de- 
signed by Dierckx. Here is a summary of our findings: 
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1. Even though a shape-control regression model can produce a surface 
with a desirable shape, it is very time consuming. The shape-control 
regression model by B-splines is more complicated and the related op- 
timization problem is more difficult to solve. Therefore, we abandoned 
the shape-control modeling idea temporarily, since long turn-around 
time is not acceptable in practice. However, we do want to produce 
spline surfaces with certain geometric features, such as convexity or bi- 
convexity. Especially, the shape of wings are crucial for its aerodynamic 
characteristics. What we hope for is that, if the fitting of digitized data 
is accurate enough, then the resulting surface would resemble the actual 
model. 

2. Our overall impression of "SURFACER” is very good. Its interface 
is user-friendly and one can learn how to use the package in a short 
time. The package is a state of art product and has great potentials. 
However, there are three aspects of this product which still require 
further research and development: (1) automatic pattern separation, 
(2) parameterization of spline surface, and (3) surface fitting. Right 
now, one has to identify each part of a model by naked eyes, even 
though it has some artificial intelligence features to help the user. The 
objective should make the interface transparent to the user so that one 
knows how those automatic pattern recognition features can be used. 
For a given set of data points, the spline surface generated by the 
package is heavily dependent on the boundary curves one creates for 
the data set. In some cases, the boundary curves completely determine 
the final output, which is undesirable. The problem is the choice of 
parameterization of the fitting spline surface. Due to the same problem, 
the free form surface fitting is not reliable at all as pointed out by Dr. 
Kurt Skifstad, the president of Imageware. Therefore, it is easy to find 
the 4 boundary curves and a B-spline surface by using the package, but 
the result might not be desirable. Then you have to go back and repeat 
the process again. Sometimes delicate decisions have to be made during 
the boundary curve fitting process so that the final spline surface would 
be acceptable. 

3. The curve/surface fitting package designed by Dierckx [2] is perfect for 
aircraft modeling in the sense that there is only one control parameter: 
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the fitting error tolerance. If you want to fit a set of data points by 
a B-spline surface, then you provide a tolerance on how far you allow 
the spline surface to deviate from the given data set and the program 
does the rest: it decides how many knots axe needed, where to put 
those knots, and finally produces a smoothest spline surface among the 
ones that fit the data set within the given error tolerance. We believe 
that this is the best strategy for aircraft modeling: finding the simplest 
spline surface that fits the data with an acceptable error tolerance. 
However, Dierckx’s spline surface fitting subroutine can only fit a data 
set {(xj, yi,Zi) : 1 < i < n) where (xi, y.) axe points inside a rectangu- 
lar region. But, in general, one first fits the boundaxy of a data set by 
4 boundaxy curves, then uses the Gordan-Coons interpolant to get a 
paxameterization of the surface, and finally fits the data set by 3 spline 
surfaces (x(tt, v), y(u, u), z(u, t>)) where (u,u) are parameters defined on 
a rectangular region. If we use Dierckx’s curve/surface fitting package, 
it will enhance the performance of “SURFACER”, where one has to 
use m an y different parameters to control the fitting process. For de- 
signers, more control parameters provide flexibility in design process; 
but, for reverse engineering, many different control parameter could be 
a nightmare when deciding which set of parameters would give the best 
fit. 


1.4 Future Research Initiatives 

Our objective is to enhance some aspects of “SURFACER”, which will be 
purchased by NASA Langley Reseaxch Center, so that it can produce better 
spline fitting of digitized data points of an aircraft. Here axe some ideas 
which can make the surface reconstruction more efficient and more accurate: 

1. automation of some steps in the surface modeling process so that it 
takes less time to produce a desirable spline surface model, 

2. a better paxameterization method for the fitting splines so that more 
reliable and more accurate spline surfaces can be generated during the 
fitting process. 

Note that any sound automation not only reduces the cost of producing a 
surface model (in terms of computer time and manpower), but also eliminates 
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potential human errors in the process. Therefore, automation is not only an 
efficiency issue but also an accuracy one. Our goal is to have a program that 
takes an error tolerance and 4 corner points (in the digitized data set) and 
produces the simplest spline surface (it can) that fits the data set within the 
given error tolerance. Technically, we would like to solve the following three 
mathematical problems: 

1. Fit the boundary points by a moving spline frame so that one does not 
have to fit the boundary points by 4 spline curves which are bonded 
together by CAGD programs. The reason is that additional errors 
result from the so-called stitching process (in “SURFACER”). What 
we want is to design a program that takes an error tolerance and four 
corner points and produces a closed spline curve with 4 corner points 
(which we call a spline frame). One possible approach is to use spline 
surve fitting with interpolation of 4 corner points. However, due to 
unreliability of digitized data, it is better to allow the fitting program to 
decide which positions should be the corner points by using information 
on positions of all boundary points. In statistical terms, we want to 
use all boundary points to predict the corner points. This will make 
the program more robust. 

2. Use a natural parametric space for the digitized surface. For example, it 
is natural to use cylindrical coordinates for fuselage, conical coordinates 
for the sharp front of an aircraft, and rectangular coordinates for wings. 
From our experiments, an appropriate choice of the parametric space 
c an make a dramatic difference in accuracy of the fitting process and 
can prevent some undesirable side-effects, such as the drifting of the 
fitting spline surface in some coordinate direction. 

3. Use inverse Gordan-Coons mapping to reformulate the fitting problem 
as a fitting problem on a rectangular region. The current practice is to 
use the Gordan-Coons mapping instead of the inverse Gordan-Coons 
mapping. For example, suppose that we have a set of data points 
{(xi,yi,Zi) : i < i < n.}, where z, = /(x;,y<) for some unknown func- 
tion f(x , y) and (x<, yi) are uniformly distributed in some region S . One 
can imagine that S is the projection (or shadow) of a wing. By using 
4 boundary curve representation of the boundary points of the given 
data set, one can explicitly write down the Gordan-Coons mapping that 
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maps [0, 1] x [0, 1] to S. If we want to produce a surface model of the 
wing by the parametric equations: x = x(u, v),y = y(u,v),z = z(u,v ) 
for 0 < u < 1,0 < v < 1 , then z* should equal z(u;,i>;) where ( ) 
should be determined by Xi = x(u;,i>; ) and j/i = y(ui,Vi). Note that 
x = x(u,v),y = y(u, v) define the Gordan-Coons mapping that maps 
[0, 1] x [0, 1] to S. Therefore, (u;, Vi ) is the image of the inverse Gordan- 
Coons mapping. Finding (it;, u;) involves solving a system of two non- 
linear equations, which should not be too difficult. But a common 
practice is to use a grid net of the rectangular region [0, 1] x [0, 1] 
and, for each (u„u t ), find a data point !/«(*, t)) that is clos- 

est to (x(u„v,),y{u„v t )). Then fit (u„u t , z i(iil) ) by a spline surface 
z = z(u,v). One can clearly see that artificial errors are introduced in 
this process. 
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Chapter 2 

Differentiable Piecewise 
Quadratic Exact Penalty 
Functions for Quadratic 
Programs with Simple Bound 
Constraints 


The quadratic program with simple bound constraints is reformulated as 
unconstrained minimization of a differentiable piecewise quadratic function. 
The two problems have the same set of local solutions, the same set of iso- 
lated local solutions, and the same set of global solutions. Unlike other 
penalty functions, a parameter involved in the unconstrained reformulation 
can be easily determined by the spectrum radius of the Hessian of the objec- 
tive function in the original quadratic program. Moreover, the exact penalty 
function can also be derived from Hestenes-Powell-Rockafellar’s augmented 
Lagrangian function for two-sided inequality constrained minimization prob- 
lems by using Fletcher’s multiplier function. 
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2.1 Introduction 

Consider the following quadratic program with simple bound constraints: 

min \x T Mx — b T x, (2-1) 

l<x<u 2 

where M is an n x n symmetric matrix, b 6 R n (a vector of n components), 
and l,u axe vectors of n components with l < u. (Note that, if k = U{ , then 
Xi = k and one can replace x t by l, in (2.1) and reformulate the problem so 
that l < u.) Some components of l or u may be — oo or +oo. 

One can take advantage of the special structure of (2.1) to design numer- 
ical algorithms for solving (2.1) [1, 3, 4, 5, 6, 7, 8, 9, 11, 8, 16, 9, 18, 13, 
14, 22, 23, 24, 28, 19, 20, 24, 34, 19, 36, 41, 42, 21, 44, 45, 50, 51, 26]. The 
article by More and Toraldo [42] contains references on quadratic programs 
with simple bound constraints in engineering applications. 

When M is positive semidefinite, Li and Swetits reformulated (2.1) as the 
following unconstrained minimization problem [24, 25]: 

min '® r o(a ; )> (2-2) 

xgR" 

where To(x) is a convex quadratic spline defined as follows: 

*o(») := - E 2 )x - x T Eh 

+ UK/ - (Ex + h)) + 1| 2 + HI ((Ex + h)~ u) + || 2 . 

Here a is a positive constant such that 0 < a||Af|[ < 1, \\M\\ is the 2-norm 
of the matrix M, E := I - acM, and h := ab. It was proved that x* is a 
solution to (2.1) if and only if x* is a minimizer of ^o(®) [24, 25]. Therefore, 
algorithms for unconstrained minimization of the convex quadratic spline 
$o( a: ) can be used to find a solution to (2.1). 

When M is a positive definite matrix, \I f o(£) is actually a strictly convex 
quadratic spline. In this case, one can use either a Newton method with line 
search to find the unique solution x* of (2.1) in finite iterations [24, 25] or a 
conjugate gradient method to generate a sequence of iterates which converge 
linearly to x* [20]. Moreover, the conjugate gradient method finds x* in finite 
iterations if x* is nondegenerate [20]. In general, it is very easy to design 
a linearly convergent descent method for finding a minimizer of a convex 
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quadratic spline which is bounded below on R n , even if the set of minimizers 
of the convex quadratic spline is unbounded [19]. As a consequence, by 
the unconstrained reformulation (2.2), one can easily generate a sequence of 
iterates which converges linearly to a solution of (2.1) when M is positive 
semidefinite and (2.1) has a solution [19]. 

The unconstrained reformulation of (2.1) was solely based on the obser- 
vation that x* is a stationary point cf ^o(x) (i.e., ^(x*) = 0) if and only if 
x* satisfies the first order optimality conditions of (2.1) (i.e., (x*,Mx* — b ) 
is a Karush-Kuhn-Tucker point of (2.1)) [24, 25]. Therefore, it is unclear 
whether the minimizers of 'J'o(x) correspond to the solutions of (2.1) when 
M is not positive semidefinite. In order to establish the correspondence with- 
out the convexity assumption, we have to relate the second order optimality 
conditions of (2.1) to those of (2.2). 

When 0 < 2a||M|[ < 1, based on careful manipulations of the second 
order optimality conditions and an observation that 'J'o(x) is actually the 
sum of the objective function of (2.1) and a quadratic spline penalty term, 
we prove that x* is a local solution (or an isolated local solution) of (2.1) 
if and only if x* is a local minimizer (or an isolated local minimizer) of 
V 0 (x). Moreover, x* is a global solution of (2.1) if and only if x* is a global 
minimizer of ^o(x). In other words, the unconstrained reformulation (2.2) 
is “absolutely” equivalent to (2.1) when 0 < 2o;|[M|[ < 1. We want to point 
out that 'fo(x) can also be derived from the Hestenes-Powell-Rockafellar’s 
augmented Lagrangian function for two-sided inequality constraints by using 
Fletcher’s multiplier function. See Section 2 for details. 

In general, one can use exact penalty functions or exact augmented La- 
grangian functions to derive unconstrained reformulations of a constrained 
minimization problem (cf. [3, 13]). Consider the following constrained mini- 
mization problem: 

min /(x), (2.4) 

xE-X 

where X C R n is the feasible set defined by some linear or nonlinear con- 
straints. In order to reformulate (2.4) as an unconstrained minimization 
problem, we introduce an exact penalty function F(x,e) which contains a 
penalty parameter e and consider the following unconstrained minimization 
problem: 

min F(x,e). (2.5) 

JV 
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The most important property of an exact penalty function is that its local (or 
global) minimizers are local (or global) solutions of the original constrained 
minimization problem, which is the main focus of most of the literature on 
this subject as noted by Han and Mangasarian [16]. This property ensures 
that any unconstrained minimization technique for finding a local (or global) 
minimizer of the penalty function yields a local (or global) solution of the 
original constrained problem. Usually, there is a threshold e* for the penalty 
parameter such that F(x,e) is exact if 0 < e < e*. However, in almost all 
cases, there is no simple way to compute the value of e*. Also, as pointed 
out by Di Pillo and Grippo [13], in practice, the threshold e* could only be 
established with reference to some compact subset V of R”. That is, in 
practice, one could only get a threshold t* such that, for 0 < e < e*, F(x, e) 
is exact for x in V. In order to measure the degree of exactness of a penalty 
function, formal definitions of exactness were introduced by Di Pillo and 
Grippo [13]. By Di Pillo and Grippo’s definition [13], for 0 < 2a||M|| < 1, 
\J/ 0 (£) is a strongly exact penalty function for (2.1) with respect to the set 

V = R n . 

There axe two other unconstrained reformulations of (2.1) given by Grippo 
and Lucidi [13] and by Coleman and Hulbert [5], respectively. Both refor- 
mulations are only valid for the case that the feasible region {x :l <x <u] 
is compact. 

In [13], Grippo and Lucidi started from the definition of differentiable 
multiplier functions that yield an estimate of the Karush-Kuhn- Tucker mul- 
tipliers as explicit functions of x. Then they constructed a penalty function 
P[x , e) containing barrier terms on a perturbation of the constraints. The 
penalty function P{x,e) is a differentiable piecewise rational function in an 
open neighborhood T> of the compact feasible region {x : l < x < u}. The 
definition of the threshold e* involves the maximum value of \\Mx — &||oo (the 
supremum norm of the vector (Mx - b )) over the feasible region {x : l < 
x < u} and the maximum value of some nonlinear expression of x and t. 
For 0 < e < e*, they proved that x* is a global minimizer of P(x, e) in V if 
and only if x* is a global solution of (2.1). Moreover, any local minimizer 
x* of P(x,e) is a local solution of (2.1). The converse is also true when 
(x*,Mx* — b) is a Karush-Kuhn-Tucker point satisfying strict complemen- 
tarity conditions. In a separate paper [14], based on the penalty function 
P(x,e), they proposed Newton-type algorithms to solve (2.1). Under suit- 
able assumptions, finite termination of a Newton method was established. 
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Grippo and Lucidi’s approach was adapted to tackle unbounded feasible sets 
by Facchinei and Lucidi in [9]. 

Coleman and Hulbert’s penalty function is a nondifferentiable piecewise 
quadratic function. They started with an l\ penalty term and the transform 
y := -(Mi — b ) that yield a penalty function /(y). Under the assumption 
that M is positive definite and the unique solution x * of (2.1) is nondegener- 
ate, they proved that a Newton-type method generates a sequence of iterates 
{y k } which converge superlinearly to y* := -(Mi* — b ) [5]. Note that one 
has to solve the linear system Mi = (6 — y*) in order to get the solution x*. 
One interesting feature of Coleman and Hulbert’s penalty function is that 
there is no penalty parameter involved. 

In comparison with Grippo and Lucidi’s penalty function P(x,e) and 
Coleman and Hulbert’s penalty function /(y), # o (a0 is simpler than P(x, e): 
a differentiable piecewise quadratic function versus a differentiable piecewise 
rational function; V 0 {x) is “smoother” than P(x,e) or /(y): differentiability 
on R n versus singularity of P(x,e) on the boundary of V (induced by the 
barrier terms) or nondifferentiability of /(y); ’® r o( a: ) i s "more exact” than 
P(x, e) or /(y): equivalence of local solutions without strict complementarity 
conditions for 'ko(x) versus with strict complementarity conditions for P(x, e) 
or with strict convexity assumption on (2.1) for /(y); the penalty parameter 
for ¥„(*) i s easier to compute than that for P(x, e): the trivial estimate of the 
threshold versus the complicated estimate of the threshold by maximization 
of linear and nonlinear functions; 'I r o(®) preserves the convexity of the original 
minimization problem (i.e., if the original quadratic program (2.1) is convex, 
then 'ko(z) is also convex), while P(x, t) or /(y) does not; ^(s) is valid for 
any feasible set while P(x, e) or /(y) is only defined for compact feasible sets. 

The paper is organized as follows. In Section 2, we outline how (2.1) 
can be reformulated as unconstrained minimization of a quadratic spline 
penalty function \k(x) and show that the penalty function 'J(x) can be 
derived from Hestenes-Powell-Rockafellar’s augmented Lagrangian function 
by using Fletcher’s multiplier function. An important relation between the 
penalty function 'P(x) and the objective function of (2.1) is also given. In 
Section 3, we relate the second order optimality conditions for (2.1) to those 
for unconstrained minimization of ’f(x). In Section 4, we prove that (2.1) 
and the unconstrained reformulation has the same set of local solutions, the 
same set of isolated local solutions, and the same set of global solutions. 
Conclusions and final remarks axe given in Section 5. 
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Now we conclude this section by giving some terminologies and notations 
used in the paper. 

For simplicity, we use f'{x) to denote the gradient of /(x) (as a column 
vector) and use /"( x) to denote the Hessian of /(x). A real-valued function 
f(x) on R n is said to be a quadratic spline, if the gradient f(x) of /(x) is 
a piecewise linear mapping from R n to R n . That is, a quadratic spline is a 
continuously differentiable piecewise quadratic function. The 2-norm || • || on 

R n is defined as |[x|| := (^T =1 x?)* and the 2-norm of an n x n matrix B is 
defined as ||R|[ := sup{||Rx|[ : x G R n with ||x|| = 1}. The transpose of a 
matrix B or a vector x is denoted by B T or x T . For x,y 6 R", x < y means 
Xi < y, for 1 < i < n, where x; or y; denotes the i-th component of x or y. 
Let (z)i (or (z)“) be the lower (or upper) truncation of z by l (or u) whose 
i-th component is max{i;,z;} (or min{ui, zi}). By convention, z+ is a vector 
whose i-th component is max{zi,0}. For convenience, we use A to denote a 
diagonal matrix whose diagonal entries are either 0 or 1. The n x n identity 
matrix is written as I and A c := I — A. A vector x* € X is said to be a local 
solution of (2.4) if there exists a positive constant S such that f(x ) > /(**) 
for x € X with ||x — x*|| < 6. A vector x* E X is said to be a isolated local 
solution of (2.4) if there exists a positive constant 6 such that /( x ) > /(*•) 
for x € AT and 0 < ||x — i*|| < 6. A vector x* € X is said to be a global 
solution of (2.4) if /(x) > /(x*) for all x € X. 


2.2 Unconstrained Reformulations 

In this section we outline how the quadratic program with simple bound 
constraints, (2.1), can be reformulated as unconstrained minimization of a 
quadratic spline \&(x) [24, 25], and show that ^(x) can be derived from 
the Hestenes-Powell-Rockafellar’s augmented Lagrangian function by using 
Fletcher’s multiplier function. Finally, an important relation between ^(x) 
and the objective function of (2.1) is given. 

It is well-known from the Karush-Kuhn-Tucker conditions that if x is a 
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solution of (2.1), then there exists w € R n such that, for 1 < i < n, 

Mx — b — w = 0, 

Xi — k if Wi > 0, 

X{ = Ui if Wi < 0, 
h < Xi < Ui if Wi = 0. 

One can verify that ( x*,w *) is a solution of (2.6) if and only 

w * = Mx* -b and x* = (x* - am*))*, 

where a is any positive constant (cf. [25]). By substituting w* = Mx* — b 
into (2.7), we observe that (2.6) is equivalent to a system of piecewise linear 
equations. However, by multiplying a special nonsingular matrix on both 
sides of (2.7), one can prove that (2.7) is equivalent to the normal equation 
of a quadratic spline (cf. [25]). The following reformulation of (2.1) follows 
from Theorem 2.6 in [25]. 

Lemma 1 Suppose that M is a symmetric matrix. Then (x*,Mx* — b) is a 
Karush- Kuhn- Tucker point for (2.1) if and only if x* satisfies the following 
piecewise linear equation: 

x = (Ex + h)(, (2.8) 

where E := I — aM, ct is any positive constant, and h := ab. Moreover, 
^ E(x — (Ex + h)() is the gradient of the following quadratic spline 

*(*) * h* T ( E ~ ~ i* TEh - !W f2 9 , 

+ill(( - (Ex + A)) + || 2 + £||((£z + h.) - u) + || ! . 

As a consequence, for 0 < cx||M|[ < 1, (x*,Mx* — b) is a Karush-Kuhn-Tucker 
point of (2.1) if and only if $'( x *) = 0. 

Remark. Reformulation of (2.6) as a system of piecewise linear equations 
x = (Ex + h)'i is a generalization of Mangasarian’s idea of reformulating a 
linear complementarity problem as a system of piecewise linear equations. 
Mangasarian’s reformulation led to matrix splitting algorithms for solving 
the linear complementarity problem [27]. 


( 2 . 6 ) 


(2.7) 
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Note that we scale the original 'J'o(^) in (2.2) by L and also add an extra 
term — |||6|| 2 . Lemma 1, not including the last statement, follows directly 
from Theorem 2.6 in [25]. When 0 < ct\\M\\ < 1, E := I - aM is a positive 
definite matrix. Therefore, ^(x*) = 0 if and only if \E(x* — (Ex* + h)f) = 0, 
which is equivalent to x* — (Ex*+h)f = 0. Prom the first statement of Lemma 
1 we know that ^(x*) = 0 if and only if (x*,Afx* — 6) is a Karush-Kuhn- 
Tucker point of (2.1). 

Now let us consider the following unconstrained minimization problem: 

min n \t(x), (2.10) 

xeR n 

which is equivalent to (2.2). If M is positive semidefinite and 0 < a|[M|[ < 1, 
then 'P(x) is a convex function. As a consequence, x* is a solution of (2.1) 
if and only if x* is a minimizer of ’P(x). Therefore, (2.1) can be reformu- 
lated as the unconstrained minimization problem (2.10) when M is positive 
semidefinite. However, the reformulation only preserves the first order opti- 
mality conditions. When M is not positive semidefinite, it becomes difficult 
to relate the second order optimality conditions for (2.1) with the second 
order optimality conditions for (2.10). Since (2.9) does not clearly show how 
^(x) is related to the original objective function \x T Mx — b T x and the con- 
straints l < x < u, we rewrite 'k(x) by substituting E = I — aM and h = ab 
into (2.9). Using simple algebraic manipulations, one can get the following 
explicit formula for 'I'(x). 


Lemma 2 For any a > 0, 

®(x) = [\x T Mx — b T x) — ||| {Mx 
+ h ||(( ^ -x ) + oc(Mx-b)) + \ 


bW 




|((x — u) — a(Mx — 6)) h 


Remark. The above penalty function can also be derived from Hestenes- 
Powell-Rockafellar’s quadratic augmented Lagrangian function. Consider the 
following constrained minimization problem: 

min /(x) subject to l < g(x) < u, (2-H) 

X 
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where / : R n — ► R and g : R n — > R m are twice continuously differentiable 
functions. The corresponding augmented Lagrangian function L(x,y,a) in- 
troduced independently by Hestenes [17, 18] and Powell [29] for equality con- 
straints and by Rockafellar [30, 31] for inequality constraints can be written 
in the following unified way: 


L(x,y,a) := f(x) + § (j(»(i) - u) +w) + | 

I fils'll 2 . 


“ 2 


( 2 . 12 ) 


where y is the Lagrangian multiplier corresponding to two-sided inequality 
constraints and a is a penalty parameter. If f(x ) = \x r Mx — b T x and 
g(x) — x, then the corresponding augmented Lagrangian function has the 
following expression: 


L(x,y,a) := \x T Mx — b T x + f 

+ ») + 

+ 9L 
' 2 

(l(/-x)-y) + 

f - INI 2 - 


(2.13) 


Note that the Lagrange multiplier y in (2.13) should satisfy the following 
equation: 

y = -(Mi — b ). (2-14) 

As a consequence, we have the following relation between 'P(x) and the aug- 
mented Lagrangian function L(x,y,a): 


$ (x) = L(x, —{Mx — b), a). 


(2.15) 


This is actually Fletcher’s idea of getting an exact penalty function depending 
only on x [10]. For example, suppose the gradients 

of the constraints are linearly independent for any x. Let g'(x) be the n x m 
matrix whose z-th column is the gradient g'^x ) of Then the Lagrangian 

multiplier y in (2.12) should satisfy the following equation: 

f(x) + g'(x)y = 0. (2.16) 
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If (2.16) has a solution y(x), then y(x) has the following expression: 

y(*) = - {(y'(*)V(*)) 1 (y'(*)) T /'(*)» 

which is Fletcher’s multiplier estimate. The function L(x,y(x),a) can be 
used as an exact penalty function for the two-sided inequality constrained 
minimization problem [31]. Fletcher [10] initially used y(x) to eliminate the 
Lagrangian multiplier in the augmented Lagrangian function for equality 
constrained minimization problems. Our derivation of the penalty function 
^(x) indicates that Fletcher’s idea can also be used for inequality constrained 
minimization problem. It is interesting to notice that, if we can solve the 
equation (2.16) to get a unique solution x(y), then L(x(y), y,a) is a penalty 
function depending only on y. The unconstrained reformulation of strictly 
convex quadratic program given in [24, 25] can be derived in this way [31], 
Now, one may consider that ^(x) is the sum of the objective function 

\x T Mx — b T x 

2 


and a quadratic spline penalty term 

P{x) := -f \\{Mx - 6)|| 2 + £ ||((/ - x) + <*(Mx - b)) + 


+ 2 ^ k(*-u)-a(Mx-6)) 


+ 


( 2 . 1 ?) 


The penalty function has some important properties that relate 'f(x) to the 
objective function of (2.1). 


Lemma 3 For any a > 0, 

vt(x) < {^x T Mx — b T x'j for x € R n with l < x < u. (2-18) 

Moreover, the equality in (2.18) holds if (x, Mx — b) is a Karush- Kuhn- Tucker 
point of (2.1). 

Proof. Let l < x < u. Define J\ := {t : (( l — x) + ot(Mx — 6)); > 0). Then, 
for i £ <7i, 

(a(Mx - b))i > ((/ - x) + a(Mx — b))i > 0 
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and 


((* - u) - a(Mx - b))i = (l - u)i - ((l -x) + a(Mx - 6)), < 0. 

Thus, 

a 2 (Mx - b) 2 i > (((l -x) + a(Mx - 6))+), 2 ig 

+ (((* - «) - a(Mx - &))+) 2 . 

Similarly, for i € J 2 ■= {* : ((x - u) - a(Mx - b))i > 0}, (2.19) holds. The 
above argument also shows that 

((1 - x) + a(Mx - 6))+((x - u) - a(Mx - b)) + = 0 (2.20) 

and Ji 0 J 2 = 0 (i.e. , J\ and J 2 are disjoint). Therefore, by (2.19), 

P(x) = -| E?=i (Mx - b) 2 + i £ i6Jl (((* " *) + <*( Mx ~ *))+)? 

(((* - U) - <*(M* - *))+)? < -I £*WMx - &) 2 < 0. 


Thus, for Z < x < u, 

'I'(x) = Qx T Mx - Z> T x^ + P(x) < Q x T Mx - b T x ^ . 

Now suppose that (x,Mx — b) is a Karush-Kuhn- Tucker point of (2.1). It 
follows from (2.20) that 


£ I ((I - x) + a(Mx - 6)) + | + £ |((x - u) - a(Mx - b)) + 
((l -x) + a(Mx - b)) + - ((x - u) - a(Mx - 6)) + | 


2a 


( 2 . 21 ) 


One can easily verify that = z + (Z — z)+ — (z — u)+ for any z € R n . Let 
z := x — a '.(Mx — b) = Ex + h. Then 


((l - x) + a(Mx - b)) + - ((x - u) — a(Mx - b)) + 
= (Ex + h)f — (Ex + h) = a(Mx — 6), 


( 2 . 22 ) 


because (Ex + h)f — x = 0 by Lemma 1. By (2.21) and (2.22), P(x ) = 0 and 
the equality holds in (2.18). ■ 
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2.3 Second Order Optimality Conditions 

All the results in this section axe attempts to establish relationships between 
the second order optimality conditions of (2.1) and (2.10). 

From the definition of ’I'(x) it is easy to verify that, if 'J'(-) is twice 
differentiable at x, then the Hessian ^"(x) has the following expression: 

#"(x) = i(£ - EAE), (2.23) 


where A := diag(Ax, • • • , A n ) with 
A, := 


1, if h < (Ex + h)i < Ui 
0, otherwise. 


Even in the case that (•) is not differentiable at x, we still have a similar 
expression. 


Theorem 4 Suppose that (x*,Mx* — b) is a Karush- Kuhn- Tucker point of 
(2.1). Then there exists a positive constant 6 such that, for any x € R n with 
||x — x*|| < 8, 

$(x) - (x*) = — — (x - x*) T (E - EAE)(x - x*), (2.24) 

where A := diag(Ai, • • ■ , A„) with 


Ai 


1, if k < (Ex* + h)i < Ui 
1, if U = (Ex* + h)i and (E(x — x*))i > 0 
1, if Ui = (Ex* + h)i and (E(x — x*))i < 0 
0, otherwise. 


(2.25) 


Proof. There exists a positive constant 6 such that 

li < (Ex + h)i < Ui, if U < (Ex* + h)i < Ui, 

li > (Ex + h)i, if U > (Ex* + h)i, (2.26) 

Ui < (Ex + h)i, if Ui < (Ex* + h)i, 
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w=w 

w 


whenever ||x — x*|| < 8. Let ||x — x*|| < 8 and A := diag(Ai,- •• ,\ n ) with A^ 
given in (2.25). Then, for x 9 := Ox + (1 - 0)x * with 0 < 0 < 1, g(0) := «(**) 
is a differentiable function of 0 and 

g\0) =(x-x*) T W(x e ) 

= I(* - X*) T (Exg - E{Exg + h)t) 

= J(* - x*) T {Exg - EA(Exg + h)~ EA c (Ex* + h)f), 

where the last equality follows from the definition of A. Thus, for 0 < 6 < 1, 
is a quadratic polynomial of 6 and 

g"{0) = i(® - x*f(E - £A£)(x - x’). 

By the Taylor expansion and ^'(x*) = 0, we have 

* (x) - *(x*) = 5 (1) - 5(0) = 5'(0) + |/(0) 

= (x - x*) T tf'(x*) + i(x - x*) T (E - EAE)(x - x*) 

= U x - x *) t ( e - eae )( x - x *y ■ 


Note that the Hessian of the objective function of (2.1) is M. Therefore, 
in order to relate the second order optimality conditions of (2.1) to those of 
(2.10), we have to study how ( E — EAE ) is related to M. 

Recall that we use A to denote a diagonal matrix whose diagonal entries 
are either 0 or 1, and A c = I - A. Since E = I - aM , by simple algebraic 
manipulations, one can verify the following matrix identities. 

Lemma 5 For any a, 

E - EAE - A c - aA c MA c + a(AMA - otMAM) (2.27) 


an.d 


EAMAE = AM A - aMAM - a(AMA) 2 

+aA c MAMA c + a 2 (MAM AM). 


(2.28) 


In order to derive the second order optimality conditions of (2.10) from 
those of (2.1), we give a lower bound estimate of x T (E — EAE)x. This 
estimate will allow us to prove that local solutions (or isolated local solutions) 
of (2.1) are also local minimizers (or isolated local minimizers) of 'P(x). 


ST 
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Lemma 6 Suppose that 0 < 2a||Af|| < 1. Then there exist a positive con- 
stant f3 such that 

x t {E - EAE)x > p(\\A c x\\ 2 + |[AMAx|| 2 ) + ax T EAMAEx. (2.29) 

Proof. By the definition of the 2-norm of an n x n matrix Q , for any y € R n , 
we have 

\\Qy}\ < ||Q|||(y||- 

By (2.30) and the Cauchy-Schwaxz inequality, we get 

y T Qy < MUM < HOIIIIyf- 

Let 7 be any positive number. Then 

||AMx|| 2 =\\AMAx + AMA c x\\ 2 

< (||AMAx|| + ||AMA c x||) 2 
<(\\AMAx\\ + \\A\\\\MA c x\\) 2 
<(\\AMAx\\ + \\M\\\\A c x\\) 2 

= \\AMAx\\ 2 + |[M|| 2 ||A c x|| 2 + 2|[M|||[A c x||||AMAx|[ 

< (1 + 7- 1 )I|AMAx|| 2 + ||M|| 2 (1 + 7)I|A c x|| 2 , 

where the first inequality is by the triangle inequality, the second inequality 
is from (2.30), the third inequality follows from (2.30) and ||A|[ < 1, and 
the last inequality follows from 2 st < 7 -1 s 2 + 7 1 2 . By (2.31) and (2.32) we 
obtain that 

<y 3 x T MAMAMx < o 3 ||M|||[AMx|| 2 

< o 3 ||M|| ((1 + 7 _1 )||AMAx|| 2 + ||M|| 2 (1 + 7)I|A c x|| 2 ) . 

Similarly, by (2.31), (2.30), and ||A|[ < 1, we can derive that 

ax T A C M A c x < a||M|||[A c x|| 2 (2.34) 

and 

a 2 x T A c MAMA c x < o 2 ||M|| 2 ||A c x|| 2 . (2.35) 


(2.30) 

(2.31) 

(2.32) 
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(2.36) 


By (2.27) and (2.28), we obtain that 

x T (E - EAE)x - ax T EAMAEx = x T A c x - ax T A c MA c x 
+a 2 x T (AMA) 2 x - a 2 x T A c MAMA c x - a 3 x T MAMAMx. 

Finally, by (2.36), (2.33), (2.34), and (2.35), we get the following estimate 

x T (E - EAE)x > (1 - a\\M\\ - a 2 \\M\\ 2 - a 3 ||M|| 3 (l + 7))||A c x|| 2 
+a: 2 (l - a||M||(l + 7 -1 ))|[AMAx || 2 + ax T EAMAEx. 


(2.37) 


Let 


and 


__ 1 ( a[|Mj| 1 — 2a||M|| + a 4 ||M|| 4 > 




2 \1 — a\\M\\ q! 3 |[M|| 3 (1 — a||Af||) 


/3 := min{l — ct||M|| — a 2 \\M || 2 — a 3 ||M|| 3 (l + 7), a 2 (l — o;||M||(l + 7 a ))}. 
Since 0 < 2a||M|| < 1, one can verify that 


0 < °1MI _ < < 1 - M\M\\ + “ 4 II«II , 


1 - Or||Af|| 


a 3 |[Af|j 3 (l — a||Af|[) ' 


(2.38) 


Since 
1 


- a.\\M\\ - a 2 | W - « 3 ||M|| 3 (1 + 7 ) = 1 _ 7a 3 ||M|| 3 . 

it is easy to verify that (2.38) implies that /? > 0. Finally, (2.29) follows from 
(2.37). ■ 


2.4 Equivalence of The Quadratic Program 
and Its Unconstrained Reformulation 

By using characterizations of local solutions and isolated local solutions of 
(2.1), we prove that a local solution (or an isolated local solution) of (2.1) is 
also a local minimizer (or an isolated local minimizer) of ^(x). By Lemma 
3, it is easy to show that the converse also holds. Moreover, x* is a global 
solution of (2.1) if and only if x* is a global minimizer of ^(x). 


25 



Lemma 7 [37] Let (x*,Mx* — b) be a Karush- Kuhn- Tucker point of (2.1). 
Then x* is a local solution of (2.1) if and only if x T Mx > 0 for x G R n 
which satisfies the following conditions: 

Xi = 0, if (Ex* + h)i < U or (Ex* + h)i > Ui 
Xi > 0, if li = (Ex* + h)i 
Xi < 0, if Ui = (Ex* + h)i. 

Lemma 8 [39] Let (x*,Mx* — b) be a Karush-Kuhn-Tucker point of (2.1). 
Then x* is an isolated local solution of (2.1) if and only if x T Mx > 0 for 
any nonzero vector x in R n which satisfies the following conditions: 

Xi = 0, if (Ex* + h)i < li or (Ex* + h)i > Ui 
Xi > 0, if U = (Ex* + h)i 
Xi < 0, if Ui = (Ex* + h)i. 

Remark. The condition given in Lemma 7 is a special case of McCormick’s 
second order necessary condition for a local minima and the condition given 
in Le mm a 8 is a special case of McCormick’s second order sufficient condition 
for a local minima [40]. 

Theorem 9 Suppose that 0 < 2a|[M[| <1. If x* is a local solution (or an 
isolated local solution) of (2.1), then x* is a local minimizer (or an isolated 
local minimizer) of^(x). 

Proof. If x* is a local solution (or an isolated local solution) of (2.1), then 
(x*,Mx* — b ) satisfies the Karush-Kuhn-Tucker conditions. By Theorem 4, 
there exist positive constants 6 and /3 such that (2.24) and (2.29) hold. Let 
x G R n with 0 < ||x — x*|| < 8. 

Note that x* is always a local solution of (2.1). (An isolated local solution 
is a local solution.) By Lemma 7 and the definition of A in (2.25), for 
x := A E(x — x*), we have x T Mx > 0; i.e., (x — x*) T EAM AE(x — x*) > 0. 
By (2.24) and (2.29), 

«(*) - ¥(**) > J?-(\\AMA(x - x*)\\ 2 + \\A c (x - x*)|| 2 ) > 0. (2.39) 
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Therefore, x* is a local minimizer of 'P(x). 

Now assume that x* is an isolated local solution of (2.1). We want to 
show that tf(x) - \&(x*) > 0. If A c (x - x*) ^ 0 or AMA(x - x*) ^ 0, then it 
follows from (2.39) that 'f(x) - ^(x*) > 0. Otherwise, 

A E(x - x *) = A(7 - aM)(x - x *) = A(7 - aM)A(x - x*) = A(x - x*). 

Since x - x* ^ 0 and A c (x - x*) = 0, we get AE(x - x*) = A(x - x*) ^ 0. 
By Lemma 8 and the definition of A in (2.25), for x := A E(x - x*), we have 
x T Mx > 0; i.e., (x - x*) T EAMAE(x - x*) > 0. Therefore, by (2.24) and 
(2.29), ’P(x) - $(x*) > 0. Hence, x* is an isolated local minimizer of 'f(x). 


Theorem 10 Suppose that 0 < a\\M\\ <1. If x* is a local minimizer (or an 
isolated local minimizer) of ^(x), then x* is a local solution (or an isolated 
local solution) of (2.1). 

Proof. Let /(x) := \x T Mx - b T x. Since x* is a local minimizer of \I>(x), 
^(x*) = 0 and there exists a positive constant 6 such that 

tf(x) > tf(x*) for x G R n with 0 < ||x — x*|| < 6. (2.40) 

By Lemma 1, (x*,Mx* — b) is a Karush-Kuhn- Tucker point of (2.1). Thus, 
by Lemma 3, ^(x*) = /(x*). It follows from Lemma 3 that 

/(x) > (x) > *(x*) = /(x*), 

whenever 0 < |[x — x*|| <6 and l < x < u. Therefore, x* is a local solution 
of (2.1). 

If x* is an isolated local minimizer of ^(x), then the strict inequality 
holds in (2.40) and, as a consequence, 

f(x) > ¥(x) > ®(x*) = /(x*) 

whenever 0 < |[x - x*|| <6 and l < x < u. Therefore, x* is an isolated local 
solution of (2.1). ■ 

Finally, we discuss the equivalence of global solutions of (2.1) and global 
minimizers of 'I'(x). First we have the following implication on the existence 
of global minimizers of 'P(x). 
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Theorem 11 Suppose that 0 < a|[M|[ < 1 .If (2.1) has a global solution , 
then ^(x) is bounded below on R n and has a global minimizer. 

Proof. Assume the contrary that ’t(x) is not bounded below on R". Then 
there exist x*’s such that 


lim (x*) = -oo. (2.41) 


Define 


J,* := {i : (. Ex k + h)i < h}, 

J k := {i : ( Ex k + h)i > u;}, 

Jo := {i:U< ( Ex k + h)i < Ui}. 


Since the index set {1, 2, • • • , n} has finitely many different subsets, we can 
select a subsequence {kj} such that J t 1 = J kl , Ju = J(j , and J 0 J = Jo 1 . 
Without loss of generality, we may replace the sequence by the subsequence 
and assume that there exist three subsets Jj, J u , and Jo of {lj 2, • • • , n} such 
that, for all k, 

Ji = {i: ( Ex k + h)i < h}, 

J u = {i:(Ex k + h)i> Ui }, (2.42) 

J 0 = {i : li < ( Ex k + h)i < uj. 

Let A := diag(A a , • ■ ■ , A„), where 


A f 


1, if i e Jo 
0, otherwise. 


Since E is nonsingular, there exists x° € R" such that 

(Ex 0 + h)i = k, if i € Ji, 

(Ex 0 + h)i = Uj, if i € J u , (2.43) 

(Ex 0 + h)i = (Ex 1 + h)i, if i € J 0 . 


Similarly as in the proof of Theorem 4, $(6x k + (1 - 6)x°) is a quadratic 
function of 6 for 0 < 0 < 1 and 

V(x k )-V(x°) = (x k -x°) T W(x°) + l-(x k -x 0 ) T (E-EAE)(x k -x 0 ). (2.44) 
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By Lemma 6, there exists a positive constant such that 

- x°) T {E - EAE)(x k - x°) > £||A c {x k - i°)|| 2 

+ f \\AMA(x k - x°)|| 2 + \(x k - x°) T EAMAE(x k - x°). 

(2A5) 

In the remaining part of the proof, we want to show that 

(. x k - x°) T V(x°) + hx k - x°) T EAMAE{x k - x°) 

Lt 

is bounded below by a linear function of (A c (x A: — x 0 )) and (AMA(x k — x 0 )). 
As a consequence, $(x fc ) is bounded below by a strictly convex quadratic 
function of ( A c [x k — x 0 )) and [AMA(x k — x 0 ))- 

Since l < (Ex 0 + h) < u and l < (Ex 0 + h) + A E(x k - x°) < u , we have 

aV'(x°) = E(x° - (Ex 0 + /Of) = E(x° - (Ex 0 + h )) = aE(Mx° - b ) (2.46) 


-oo < K < f((Ex° + h) + AE(x k - X 0 )) - f(Ex° + h) 

= (x k - x°) T EA(M(Ex° + h)-b ) (2.47) 

+ l(x k - x°) T EAMAE(x k - x°), 

where f(x ) := |x T Mx — b T x , k := \rdi< x <u f ( x ) — f(Ex° + h), and the 
equality in (2.47) is the Taylor expansion for f(x). By (2.46) and 

M(Ex° + h)-b = E(Mx° - 6), 

we get 

$'(x°) - EA(M(Ex° + h) - b) = (E - EAE)(Mx° - b). (2.48) 

By simple algebraic manipulations, one can derive from (2.27) that 

E - EAE = A C (EA C - a 2 MAM) + aAMAE. (2.49) 

It follows from (2.48) and (2.49) that 

(x k - x°) T V(x°) - (x k - x°) T EA(M(Ex° + h)-b ) 

= p T A c (x k — x°) + q T AMA(x k — x°), 
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where p := ( EA C — a 2 M AM)(M x° — b ) and q := aE(Mx° — 6). From (2.47) 
and (2.50) we obtain 

(x k - x 0 ) T tf'(x 0 ) + |(x* - x°) T EAMAE(x k - x°) 

> p T A c (x k — a: 0 ) + q T AMA(x k — x ° ) + a/c. 

Finally, by (2.44), (2.45), and (2.51), we obtain that 

tf(x*) - ¥(x°) > h(A c (x - x°),AMA(x - x 0 )), 

where 

%,*) := £(|| y\\ 2 + \\zr) + p T y + q T z + K. 

Since h(y,z ) is a strictly convex quadratic function of (y, z), there exists 
a constant rj such that h(y,z) > rj for all y,z € R". As a consequence, 
tf(x fc ) > ^(x 0 ) 4- 77 > — oo. The contradiction proves that v]/(x) is bounded 
below on R n . 

By Frank- Wolfe Theorem [9, 6], if a piecewise quadratic function is bounded 
below on R n , it has a global minimizes Therefore, ^(x) has a global mini- 
mizer. ■ 

Theorem 12 Let 0 < o;||M|| < 1. Then x* is a global solution of (2.1) if 
and only if x* is a global minimizer o/\I/(x). 

Proof. . Assume that x* is a global solution of (2.1). Then (x*, Mx* — b) is a 
Karush-Kuhn-Tucker point of (2.1) and, by Lemma 3, ^(x*) = /(x*), where 
/(x) := |x r Mx — b T x. By Theorem 11, 'P(x) also has a global minimizer x. 
Since $'(x) = 0, by Lemmas 1 and 3, (x,Mx - b) is a Kaxush-Kuhn- Tucker 
point of (2.1) and 

tf(x) = /(x) > /(x*) = tf(x*). 

Thus, x* is a global minimizer of \P(x). 

Now suppose that x* is a global minimizer of \J'(x). By Lemma 3, /(x*) = 
tf(x*) and 

f{x) > 'Jf(x) > «(*•) = /(x*) for x € R n with l < x < u. 
Therefore, x* is also a global solution of (2.1). ■ 
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2.5 


Conclusions 


We were able to show that, for 0 < 2a|[M|| < 1, the quadratic program (2.1) 
is “absolutely” equivalent to the unconstrained minimization of the quadratic 
spline \k(x) in every conceivable way: the two problems have the same set of 
local solutions, the same set of isolated local solutions, and the same set of 
global solutions. Moreover, the exac* penalty function ^(x) is convex if (2.1) 
is a convex quadratic program. Majthay and Mangasarian’s chaxacterizations 
of local solutions [37] and isolated local solutions [39], respectively, of (2.1) 
were essential in our analysis. 

Since M is symmetric, ||M||i = ||M||oo and we have the following upper 
bound of ||M||: 


||M|| < y'llMllillMHoo = max J2 Kil> 

-*- n j = i 

where rriij is the ( ij ) entry of M. See Section 2.2 of [19] for details. There- 
fore, 0 < 2a|[M|| < 1 if 

/ n 

0 < a < 2 • max V) 

\ l<i<n 

\ - - j=i 

This provides a simple and explicit estimate for threshold of the penalty 
parameter a. 

From the discussion in Section 2, we realize that the exact penalty func- 
tion 'I'(x) could actually be derived from Hestenes-Powell-Rockafellar’s aug- 
mented Lagrangian function for two-sided inequality constrained problems 
by using Fletcher’s multiplier function. The process indicates that one may 
obtain an exact penalty function by eliminating either the primal variable 
x or the dual variable multiplier y from the augmented Lagrangian func- 
tion by simply using the equations in the Karush-Kuhn- Tucker conditions 
for the constrained minimization problem. At least the approach works for 
the quadratic program with simple bound constraints. Further results for 
general nonlinear programming problems can be found in [31]. 

Algorithms based on the exact penalty function ^(x) to solve (2.1) were 
discussed in [24, 25, 19, 20] when M is positive semidefinite. It would be 
interesting to study how the exact penalty function ^(x) can be used to 
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design new numerical algorithms for solving nonconvex quadratic programs 
with simple bound constraints. 

Acknowledgement. The author would like to thank Prof. 0. L. Mangasar- 
ian for providing references [37, 39] on characterizations of local solutions and 
isolated local solutions of a quadratic program. 
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Chapter 3 

A Conjugate Gradient Method 
for Strictly Convex Quadratic 
Programs with Simple Bound 
Constraints 


In this paper, we show that an analogue of the classical conjugate gradient 
method converges linearly when applied to solving the problem of uncon- 
strained minimization of a strictly convex quadratic spline. Since a strictly 
convex quadratic program with simple bound constraints can be reformu- 
lated as unconstrained minimization of a strictly convex quadratic spline, 
the conjugate gradient method is used to solve the unconstrained reformula- 
tion and find the solution of the original quadratic program. In particular, 
if the solution of the original quadratic program is nondegenerate, then the 
conjugate gradient method finds the solution in finite iterations. 


3.1 Introduction 

Consider the following convex quadratic programming problem: 

min \x T Mx — b T x 
subject to l < Ax < u, 
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where M is an n x n symmetric positive semidefinite matrix, A is an m x 
n matrix, x,b £ R n , and l,u are vectors of m components with U < Ui 
(some components of l,u might be ±oo). See [19] for a survey on iterative 
methods for solving (3.1). Recently, Li and Swetits reformulated (3.1) as 
an unconstrained minimization problem with a convex quadratic spline as 
the objective function, whenever M is positive definite or A is the identity 
matrix [24, 25]. Here we say that a function f(x ) is a quadratic spline if 
f(x) is a differentiable piecewise quadratic function. Therefore, algorithms 
for unconstrained minimization of a convex quadratic spline can be used to 
solve convex quadratic programs with simple bound constraints and strictly 
convex quadratic programs. In [24], a Newton method with line search was 
proposed to solve (3.1) in finite iterations when M is positive definite and 
the rows of A are linearly independent. Later, a modified version of the 
Newton method was given in [25] which can also find the solution of (3.1) in 
finite iterations, under the assumption that M is positive definite. All these 
finite algorithms axe based on Newton methods for solving the unconstrained 
reformulation of (3.1) and can start with any initial guess. However, these 
methods involve solving a linear system in each iteration. They axe most 
efficient for problems where M is a diagonal matrix and A has a banded 
structure or other spaxse patterns. For general cases, it seems a good idea 
to use a “cheap” descent method for a good initial guess and then to use 
a Newton method for a more accurate numerical solution of (3.1). This 
leads to a general theory of lineaxly convergent descent methods for finding 
a minimizer of a convex quadratic spline which is bounded below on R n [19]. 
It turns out that it is extremely easy to design a lineaxly convergent descent 
method for unconstrained minimization of a convex quadratic spline, even if 
the set of minimizers of the convex quadratic spline is unbounded [19]. It 
is clear now that a reformulation of (3.1) as unconstrained minimization of 
a convex quadratic spline allows one to develop new algorithms for solving 
(3.1) [23, 24, 25, 19]. Somehow, the problem of unconstrained minimization 
of a strictly convex quadratic spline is very similar to a linear system with 
a symmetric positive definite matrix, while the problem of unconstrained 
minimization of a convex quadratic spline is similar to a linear system with 
a symmetric positive semidefinite matrix. During a discussion with J.-S. 
Pang, he suggested that a conjugate gradient method might be developed 
for unconstrained minimization of a (strictly) convex quadratic spline. In 
this paper, we show that what Pang suggested can be done. 
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In past, various conjugate gradient methods were proposed for solving 
the following unconstrained minimization problem: 


Jfr /(I) ’ 


(3.2) 


where /(x ) is a function bounded below on R". The major concerns in the 
design of those conjugate gradient methods were how to avoid the compu- 
tation of Hessians and how to implement inexact line searches so that the 
algorithm would be computationally efficient. The standard assumptions 
are that the objective function is twice continuously differentiable and has 
bounded level sets. The global convergence of a conjugate gradient method 
means that 


lim inf |[/ , (x fc )|| = 0, 


(3-3) 


where /'(x ) is the gradient of /(x) and {x k } are the iterates generated in the 
following way: 

P M -/'(**) + ftp 1 . 

fc+1 k i k \ ' ) 

x +l := x K + oikP . 

Here x k ,p k are vectors in R n , x° is the given initial point, p° := 0, and a;*, fa 
are scalars which characterize the underlying conjugate gradient method. See 
[1, 2, 7, 10, 11, 12, 22, 29] for various rules of selecting o'*, fa to get a globally 
convergent conjugate gradient method. 

When f(x) is a convex quadratic spline, it is very easy to evaluate its 
Hessian (if it exists) and one can also use a linear time algorithm for line 
search [4, 5, 13, 24]. The difficult issues in the design of conjugate gradient 
methods for solving (3.1) with a general twice differentiable objective function 
f(x) will disappear once /(x) becomes a convex quadratic spline. However, 
some new difficult problems will emerge. First of all, a quadratic function has 
a piecewise linear mapping as its gradient and is not twice differentiable in 
general. Secondly, the set of minimizers of a convex quadratic spline might be 
unbounded; therefore, we can not assume bounded level sets. But the simple 
structure of a convex quadratic spline /(x) allows us to develop conjugate 
gradient methods which generate iterates {x^} such that 


dist (x fc ,**) <y(fo + \/7o)(l- 


(l+/o) 2 , 


for k > 0, (3.5) 
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where /„ := f(x°) - inf /(x), 6, 7 are positive constants depending only 
on /, X* := {x* € R n : /(x*) = inf^R" /(x)} is the set of all minimizers 
of /(x), and dist(x A; , A - *) := min{|[x fe — x*|[ : x* € X*} is the distance from 
x k to X* . Note that (3.5) implies that any accumulation point of {x fc } is a 
minimizer of /(x) and {x*} converge linearly to the unique minimizer of /(x) 
if X* is a singleton. Also we can prove that {||/'(x fc )||} converge linearly to 
zero. However, in general, if the solution set is not a singleton, we do not 
know whether or not the iterates {x*} converge. 

The proposed convergent conjugate gradient methods may be applied to 
solving a quadratic programming problem whenever it can be reformulated as 
unconstrained minimization of a convex quadratic spline. In this paper, we 
only discuss applications to strictly convex quadratic programs with simple 
bound constraints: 

min -x T Mx — b T x, (3-6) 

1<X<U 2 

where M is an n x n symmetric positive definite matrix, b € R n (a vector of n 
components), and l , u axe vectors of n components with l < u. (Note that, if 
li = Uj, then x; = U and one can replace x; by U in (3.6) and reformulate the 
problem so that l < u.) The article by More and Toraldo [20] has an extensive 
list of references on quadratic programs with simple bound constraints. 

In 1969, Polyak [23] proposed a conjugate gradient method for solving 

(3.6) . Since then, there were several articles devoted to discussions on conju- 
gate gradient methods for solving (3.6) [21, 26, 20]. However, these conjugate 
gradient methods do not fit the framework (3.4). In general, for an iterate 
x k which is not feasible, a special procedure must be used to modify x k so 
that the new x k satisfies the simple bound constraints l < x k < u. Since 

(3.6) can be reformulated as unconstrained minimization of a strictly convex 
quadratic spline function [25], we can use a natural extension of the classical 
conjugate gradient method to solve the unconstrained reformulation and find 
the solution of (3.6). Note that this approach is completely different from 
the existing conjugate gradient methods for solving (3.6), which axe closely 
related to active set methods. If the quadratic programming problem (3.6) 
has a nondegenerate solution, then our conjugate gradient method finds its 
solution in finite iterations. As we mentioned before [24, 25], one can 
use a finite Newton method to solve a strictly convex quadric programming 
problem. However, if the matrix M is sparse and the matrix inversion for 
computing the Newton direction is not desirable, then our conjugate gradient 
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method provides an appealing alternative for solving (3.6). 

The paper is organized as follows. In Section 2, we discuss a class of conju- 
gate gradient methods for unconstrained minimization of a convex quadratic 
spline function and establish the global error estimate (3.5). Then, we prove 
that a natural extension of the classical conjugate gradient method finds the 
unique minimizer x* of a strictly convex quadratic spline function f(x) in fi- 
nite iterations, if /(x) is twice differentiable at x*. In Section 3, we show that 
if the solution x* to (3.6) is nondegenerate, then the strictly convex quadratic 
spline function obtained in the unconstrained reformulation of (3.6) is twice 
differentiable at x*. As a consequence, we have a conjugate gradient method 
for (3.6) which generates a sequence of iterates {x fc } such that {x^} converge 
linearly to x*. Moreover, x k = x* for k large enough if x* is nondegenerate. 
Some final conclusions axe given in Section 4. 

Now we conclude this section by giving some terminologies and notations 
used in the paper. 

For simplicity, we use /'(x) to denote the gradient of /(x) (as a column 
vector) and use f"(x) to denote the Hessian of /(x). A real-valued function 
/(x) on R n is said to be a quadratic spline, if the gradient f(x) of /(x) is 
a piecewise linear mapping from R" to R/\ That is, a quadratic spline is 
a continuously differentiable piecewise quadratic function. Note that f"(x) 
might not exist if / is a quadratic spline. However, we can have a collection of 
closed convex polyhedral subsets { ^ = i of R n such that /(x) is a quadratic 
function on each Wi and U;=i IF, = R". Then /"(x) exists in the interior of 
each Wi but might not exist on the boundary of W;. For convenience, we use 
H{x) to denote the Hessian of / restricted on some Wi containing x. Note 
that, if x is contained in several Wi’s, then one can arbitrarily choose one 
of such Wj’s. The 2-norm || • || on R n is defined as |[x|| := (£” =1 x?) 2 . The 
transpose of a matrix B or a vector x is denoted by B T or x T . For x, y € R n , 
x < y means Xi < yi for 1 < i < n. We say that iterates {x fc } converge 
linearly to x* if there exist positive constants 7 and 6 such that 0 < 6 < 1 
and \\x k — x*|| < 7 6 k for k > 1. 
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3.2 Conjugate Gradient Methods for Con- 
vex Quadratic Splines 

In this section we propose a class of conjugate gradient methods for uncon- 
strained minimization of a convex quadratic spline function /(x): 

/min := inf /(x). (3.7) 

Here we assume that /(x) is bounded below on R n {i.e. / m j n > — oo). Then 
it follows from Frank- Wolfe Theorem [9, 6] that the set of minimizers of f(x ) 
is not empty. That is, X * := {x 6 R n : /(x) = f m i n ) ± 0. By using a 
globed error bound for approximate solutions of a convex piecewise quadratic 
program, we establish global error estimates for iterates generated by the 
proposed conjugate gradient methods. As a consequence, we prove the linear 
convergence of an analogue of the classical conjugate method when f(x) is 
strictly convex. If, in addition, the strictly convex quadratic spline /(x ) 
is twice differentiable at its unique minimizer x*, then the analogue of the 
classical conjugate gradient method finds x* in finite iterations. 

First we formulate a class of conjugate gradient methods based on a 
sequence of positive definite matrices {D k }. 

Algorithm 13 For any given x° and p° = 0 in R n , generate a sequence of 
iterates as follows: 

(13.1) let r k := -f'{x k ); 

(13.2) choose a positive definite matrix D k ; 

(13.3) set p k := - if p k ^ 0 or := 0 if p k = 0; 

(13-4) compute the descent direction p k+1 := r k + fi k p k ; 

(13.5) find the step size t k > 0 such that (p k+1 ) T f'(x k + t k p k+1 ) = 0; 

(13.6) set x k+1 := x k + t k p k+1 . 

Remark. In general, we choose D k := whenever H(x k ) is positive 

definite. This is always possible if f(x) is strictly convex. When f"(x k ) is only 
positive semidefinite, we might choose D k = H(x k ) + el as an approximation 
of H(x k ), where e is a fixed positive constant and I is the identity matrix. 
In the case that f(x) is a strictly convex quadratic function, one can easily 
verify that the above method, with D k = f"(x k ), is the classical conjugate 
gradient method. 
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Theorem 14 Assume that there exists a positive constant e such that 

e\\x \\ 2 <x T D k x<t- 1 \\x \\ 2 

for x € R n and k = 0,1,-**. Let { x k } be the sequence of iterates generated 
by Algorithm IS. Then there exist two positive constants 8 = 8{f,t) and 
7 = 7 (/) such that 

dist (x k ,X*) < 7 • (/o + y/To) ^ + fo p) f ° r k ~°' ( 3 ' 8 ) 

where f 0 := f{x°) - /min- 

Proof. It is proved in [25] that, if / is a convex quadratic spline, then there 
exists a positive constant a (depending only on /) such that 

(nH^) - a ^^~ 

whenever p T f{x) < 0 and p T f'(x + tp) = 0 (cf. Lemma 3.1 in [25]). 

By (13.5) in Algorithm 13, we have 

(p k+1 ) T f (x k + tk P k+1 ) = 0 

and 

(p‘ +I ) r /'(*‘) = (r*) T /'(x*) + A(p‘)7'(z‘) = -||r*f < 0. (3.9) 

Therefore, 

(V +I ) T /'(**))’ 

v iip* +, ii t 

Since p T D k p > e|[p[[ 2 , we derive that 
\\p k+l \\ 2 < -(p k+1 ) T D k p k+1 

= i (( r k ) T D k r k + 2 p k (r k ) T D k p k + p 2 k (p k ) T D k p k ) 

= I (( r k ) T D k r k - p 2 {p k ) T D k p k ) 

< l -{{r k ) T D k r k ), 


< a (/ (x fc ) - / (s fc+1 )) . (3.10) 
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where the last equality follows from the definition of /?*,. Since r T D k r < 
i||r|[ 2 , by the above inequality, we obtain 



It follows from (3.9) and (3.11) that 


(/ +l ) T /'(**) = \r (**) f > £ 2 



By (3.10) and (3.12) we get 




(/ (**) - / (x‘+‘)) . 


(3.11) 


(3.12) 


(3.13) 


Since f(x) is a convex quadratic spline, by Theorem 2.2 in [19], (3.13) implies 
that there exists a positive constant S (depending only on / and ^-) such that 


f ( X>C ) fmm — f° ^ (1 + /o) 2 ) ’ 


(3.14) 


where / 0 := f(x°) - f m - m > 0. 

By the global error estimate for a feasible solution of a convex piecewise 
quadratic program (cf. Corollary 2.8 in [15]), there exists a positive constant 
7 (depending only on /) such that 


dist (x‘,X") < 7 (/ (**) - /min + \/f (**) ~ /min) • 
By (3.14), we have 

S \fh (i - - <n//o(i- 4 


(3.15) 


(1 + /o)/ 

It follows from (3.14), (3.15), and (3.16) that 


(l + /o) 2 , 


. (3.16) 


dirt («*,*•) < 7 (/. + l/fe) (l ~ (1 / /o): , ) ■ 


This completes the proof of Theorem 14. 
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Remark. The restriction on {D k } means that {D k } is a sequence of uni- 
formly bounded matrices and is also bounded aways from the set of singular 
matrices. It is easy to see that, if D k, s are chosen from a finite collection 
of positive definite matrices, then { D k } satisfies the assumption made in 
Theorem 14. In particular, D k := H(x k ) satisfies the assumption made in 
Theorem 14 if /(x) is strictly convex. 

Note that, if f(x) has a unique minimizer x*, then dist(x fc , X*) = ||x fc — x*|| 
and the sequence of iterates |x fc | converges linearly to the minimizer x* of 
/(x). Theoretically, regardless of the choice of D k ' s, {x k } always converges 
linearly to x*. However, a better choice of D k ' s may increase the value of 
8 and speed up the convergence of {x k }. The next theorem shows that 
D k := H(x k ) is generally a sound choice for D k, s, especially if /(x) is twice 
differentiable at x*. 


Theorem 15 Suppose that /(x) is a strictly convex quadratic spline and x* 
is the unique minimizer of f(x). For k> 1, define 



/, 

H(x k ), 


if H{x k ) ± H{x k ~') 

if f\x k -') ± f'(x k ) + H(x k )(x k ~ l - x k ) 
otherwise. 


Then the iterates {x k } generated by Algorithm 13 converge linearly to x*. If, 
in addition, f"(x*) exists and H(x) = f"(x) whenever f"(x) exists, then the 
iterates {x k } converge finitely to x*. That is, x k = x* when k is large enough. 

Proof. Since there axe only finitely many distinct _ff(x fc )’s, D k, s satisfy the 
assumption of Theorem 14. Thus, the iterates {x*} converge linearly to the 
unique minimizer x* of /(x). Since f"(x*) exists, by Taylor expansion, we 
have 

/(*) = f(x‘) + (/'(i*)f(z - x*) + hx - X*) T /V)(* - *•) + 0(11* - *’f). 

(3 ' 17) 

If Wi contains x*, then there exist a vector h and a matrix B such that 
/(x) = /(*•) + h T (x - X*) + l -{x - x*) t B(x - x*) for x 6 W iy (3.18) 
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since /(x) is quadratic on Wi. For any x € W; , let x t = x* + f(x — x*). Then 
x t eWiiorO<t< 1. It follows from (3.17) and (3.18) that, for 0 < t < 1, 

th T (x - x*) + ~(x - x*) T B(x - x*) = f{x t ) 

= t(f(z‘)) T (z - X*) + - x-ff"(x’)(z - x*) + o(t 2 ||x - x*|| 2 ). 

As a consequence, for any x € Wi, 

h T (x - x*) = (/'(x*)) r (x - x*), 

(x - x*) T B{x - x*) = (x — x*) r /"(x*)(x — x*). 

The above argument shows that, if x* 6 Wi, then 

/(x) = f(x*) + (/'(x*)) T (x - x*) + ~(x - x*) T /"(z*)(* “ **) f° r x € Wi. 

(3.19) 

Let W* := Ur*€W, W. Let S t := dist(x*, Wi) and 5 := min{£ : x* £ Wi}. 
Since Wi axe closed, 6 > 0. From the definition of 6 we know that x € W* 
if |[x — x* || < 6. Therefore, x* is in the interior of W*. Moreover, by (3.19), 
/(x) is actually a quadratic function on W*. 

Since x* is in the interior of W* and {x*} converge to x*, there exists an 
integer ko > 0 such that x k is in the interior of W* for k > k 0 . Since /(x) 
is a quadratic function on W*, /"(x) = f"{x*) for x in the interior of W*. 
Thus, for k > ko, 

D* = J?(x‘) = f"(x k ) = /»(**), (3 2Q) 

/'(x 1 - 1 ) = /'(**) + - x k ). 1 ■ 

Let k* be the smallest nonnegative integer such that (3.20) holds for k > k * . 
Then, by the definition of D k , we get D k * = I. Since (p fc *) T r** = 0, we have 
p fc,+1 = r k * . Therefore, {x*}fc>jt« is a sequence of iterates generated by the 
classical conjugate gradient method applied to the strictly convex quadratic 
function 

g(x) := /(**) + (x - x*) T f (*•) + |(x - x*) r /"(x*)(* - *•) 

= /(**) + \(x - x*) T f"(x*)(x - X*). 
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Thus, we have x k = x* for k > k* + n. 

Remark. Note that ( p k ) T r k = 0 for all k. Therefore, if D k = 7, then 
pk+i _ r k con j U g a te gradient algorithm automatically restarts at x k . 

The condition for D k — H(x k ) is that /(x) is the same quadratic function 
at x k and x k ~ l . Suppose that f{x) has a unique quadratic representation 
on each polyhedral region Wj. Then the choice of D k guarantees that the 
conjugate gradient algorithm restarts whenever the new iterate enters a poly- 
hedral region different from the current one. It seems appropriate to restart 
in a new region since /(x) has a new quadratic form there. The only prob- 
lem is that it degenerates to the steepest descent method if the iterates keep 
moving from one region to another region. However, the twice differentiabil- 
ity of /( i) at the solution ensures that the iterates will stay in one region 
eventually. In general, it is not easy to decide when to restart the conjugate 
gradient method. Powell [29] did a thorough analysis on numerous restart 
strategies, including Fletcher and Reeves’s restart in n or (n + 1) iterations 
[7] and Beale’s restart procedure [3]. Without restarts, Powell [24] shows 
that the conjugate gradient method usually has a linear rate of convergence. 

In application, it is convenient to use i7(x), the Hessian of /(x) restricted 
in some W; containing x, instead of f"(x) (cf. the next section). If f"(x) 
exists and Wi has nonempty interior, then H(x) (= f"(x)) is independent of 
the choice of Wi. However, if W t has no interior points, then W, is contained 
in an (n — l)-dimensional hyperplane in R n . In this case, there are many 
different quadratic functions whose restriction on Wi are the same as the 
restriction of /(x) on Wi- This means that H(x ) might not be the same 
as f"{x) even though f"{x) exists. For example, consider /(x i,x 2 ) = (xi + 
x 2 )+ + A(A — 1 — x\ — x 2 )+, where A is a fixed constant satisfies 0 < A < 1. 
Then it is natural to consider / as a convex quadratic spline over 4 convex 
polyhedral regions: 


w : 

:= {(xi,x 2 ) : 

(xi + x 2 ) > 0,(A - 

1 

-Xi- 

X 2 ) 

>0}, 

w 2 

:= {(xi,x 2 ) : 

(xi +x 2 ) > 0,(A - 

1 

-Xi- 

x 2 ) 

O 

VI 

w 3 

c* 

II 

(xi +x 2 ) < 0,(A - 

1 

- Xi - 


>0}, 

w 4 

:= {(xi,x 2 ) : 

1 

-s 

O 

VI 

C* 

+ 

r—t 

1 

- Xi - 

X 2 ) 

<0}. 

Obviously, for 0 

< A < 1, Wt 

is empty and the Hessian of j 

' on 

W 4 is 
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However, if A = 1, then /( xi,xi) = (xi + X 2) 2 and f"{ x ) = 2 ) ' 

case, Wi = W 4 and all the regions contain the origin. However, if we use 
H( 0, 0) = ^ ^ as the Hessian of / on W 4 (which degenerates to a point), 

then H{ 0,0) ^ /"(0,0). When a polyhedral region Wi is defined by linear 
inequalities Ax > b , Wi has nonempty interior if and only if Ax > b satisfies 
the Slater condition (i.e., Ax > b for some x). There is no simple way 
to check whether a polyhedral region has interior point or not. Thus, we 
can not expect that our choice of Wi guarantee H(x) = f"(x ) when / is 
twice differentiable at x. That is the reason why we have to assume that 
H(x ) = /"(x) in the above theorem. However, if all Wi have nonempty 
interior, then H(x) = f"{x) whenever f"(x) exists (cf. the proof of Theorem 
17). 

3.3 Strictly Convex Quadratic Programs with 
Simple Bound Constraints 

In this section we outline how the strictly convex quadratic program with 
simple bound constraints, (3.6), can be reformulated as unconstrained mini- 
mization of a strictly convex quadratic spline \?(x) [24, 25]. As a consequence, 
we have an analogue of Theorem 15 for (3.6). An very interesting result is 
that \&(x) is twice differentiable if (3.6) has a nondegenerate solution (cf. the 
proof of Theorem 17). 

It is well-known from the Karush-Kuhn-Tucker conditions that x is a 
solution of (3.6) if and only if there exists t 0 E R n such that, for 1 < i < n, 

Mx — b — w = 0, 

Xi = L if u>i > 0, , 

■ : ( 3 . 21 ) 

Xi = Ui if Wi < 0, 

li < Xi < Ui if Wi = 0. 

Based on the above Karush-Kuhn-Tucker conditions, one can verify that x* 
is a solution of (3.6) if and only 

x* = (x* - aw*))*, (3-22) 
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where w* = Mx* — b and a is any positive constant (cf. [25]). Here (z); (or 
(z) u ) is the lower (or upper) truncation of z by l (or u) whose i-th component 
is max{Zi,z;} (or min{iti, zj). By substituting w* = Mx* — b into (3.22), 
we observe that (3.6) is equivalent to a system of piecewise linear equations. 
However, by multiplying a special nonsingular matrix on both sides of (3.22), 
one can prove that (3.22) is equivalent to the normal equation of a convex 
quadratic spline (cf. [25]). The following lemma follows from Theorem 2.6 
in [25]. 

Lemma 16 Suppose that M is symmetric positive definite. Then x* is a 
solution of (S.6) if and only if x* satisfies the following piecewise linear equa- 
tion: 

x = (Ex + A.)“, 

where E := I — aM, a is any positive constant with a||M|| < 1, \\M\\ denotes 
the spectral radius of M , and h := ab. Moreover, E(x — (Ex + h)f) is the 
gradient of the following strictly convex quadratic spline 

¥(*) := i||(/ - C-E* + A))+|| 3 + + A) - m) + ||*. 

As a consequence, x* is a solution of (S.6) if and only if x* is the unique 
minimizer of' i 6(x). 

Before applying Algorithm 13 to f(x) := \&(x), we give an explicit formula 
for H(x). For any x 6 R", let <r(x) be the diagonal matrix whose i-th 
diagonal element <ru(x) is defined by the following formula: 

,«(*) := { if k < (EX + h)< < Ui (3.23) 

^ } X 0, if (Ex + h)i < h or (Ex + h)i > Ui. 

Then one can verify that, if \P"(x) exists, then 

*"(*) = E - Ecr(x)E. 

Thus, we define 

H(x) :=E- Ea(x)E. (3.24) 
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One may consider H(x k ) as the Hessian of 'Jj(x) restricted on the closed 
convex polyhedral set 

W(x k ) := {x e R n : (Ex + h)i > Ui for i € I u (x k ), (Ex + h)i < U 

(3.25) 

for i € h(x k ), and U < (Ex + h), < U{ for i € Io(x k )}, 


where 

I u (x k ) := {i : (Ex k + hfi > «<}, 

h(x k ) := {i : (Ex k + h)i < U}, (3.26) 

I 0 (x k ) := {i:U< (Ex k + h)i < uj. 

With D k := H(x k ) and f'(x) := Ex - E(Ex + h)f in Algorithm 13, we 
obtain a linearly convergent conjugate gradient algorithm for solving the 
strictly convex quadratic program with simple bound constraints. Moreover, 
the nondegeneracy assumption of (3.6) implies the finite termination of the 
conjugate gradient method. Recall that a solution x* of (3.6) is nondegener- 
ate if an only if 

x*i = h if (Mx* — b)i > 0, 

x* = u, if (Mx* — b)i < 0, (3.27) 

li < x* < Ui if (Mx* — b)i = 0. 

Theorem 17 Suppose that M is a positive definite matrix. For any vector 
x° in R n and p° := 0, generate a sequence of iterates {x k }, k = 1,2, •••, as 
follows: 

(17.1) let r k := E(Ex k + h)f - Ex k ; 

(17.2) ifW(x k ) ^ W(x fc-1 ) or p k — 0, set fa '■= 0; otherwise, set 

(r k ) T Ep k - ( Er k ) T <r(x k )(Ep k ) _ 

^ (p k ) T Ep k - (Ep k ) T a(x k )(Ep k ) ; 

(17.3) compute the descent direction p k+1 := r k + fap k ; 

(17. 4) find tk > 0 such that 

(Ep k+1 ) T (x k + t kP k+1 - (Ex k + h + t k Ep k+1 )?) = 0; 

(17.5) set x k+1 := x k + t k p k+1 . 

Then {x fc } converge linearly to the unique solution x* of (3.6). Moreover, if 
x* is a nondegenerate solution of (3.6), then x k = x* when k is large enough. 


52 



Proof. Note that, if M is positive definite, then E is a positive definite 
matrix with eigenvalues in the interval (0,1). Thus, E — E 2 is a positive 
definite matrix. As a consequence, x T {E — Ea{x k )E)x > x T {E — E 2 )x > 0 
for x ^ 0. Therefore, D k := H{x k ) is positive definite (cf. [25]). Note that 
^(i) is the same quadratic function on W{x k ) and W(x* _1 ) if and only if 
W{x k ) = W(x k ~ l ). Therefore, one can verify that the steps (17.1)— (17.5) 
generate the same iterate as Algorithm 13 with D k defined as in Theorem 
15 and f'(x) = Ex - E{Ex + h)f. Hence, by Theorem 14, {x fc } converge 
linearly to x*. Note that, if f"{x k ) exists, then, by the proof of Theorem 15, 

(x - x k ) T H{x k ){x - x k ) = (x - x k ) T f"{x k ){x - x k ) for x € W{x k ). (3.28) 

Since E is positive definite, W{x k ) has nonempty interior and {x — x k : x € 
W{x k )} spans R n . Therefore, (3.28) implies H(x k ) = f"{x k ). Moreover, 
H{x k ) = H{x k_1 ) and To complete the proof, we only need to show that 
$'(x) = Ex — E{Ex + h)f is differentiable at x* if x* is nondegenerate, since 
Theorem 15 implies the finite convergence of {x fc }. 

Let x* be a nondegenerate solution of (3.6). Then (cf. (3.25)) 

intW(x*) = {x : ( Ex + h)i < U for i G //(x*), ( Ex + h)i < U 

for i € /j(x*), and U < {Ex + h)i < Ui for i € /o(z*)}- 

By (3.27), for any index i, {Ex* + h)i = (x* — a{Mx* — b))i can not equal 
to h or Ui. Therefore, x* € intW(x’). Since ^'(x) is a linear mapping on 
the open set intW(x*), $"(x) exists for x € intW(x*). In particular, ^"(x*) 
exists. This completes the proof of Theorem 17. 

Remark. We only need to keep a sign pattern vector S{x) for each 
polyhedral region W{x): Si{x) = 1 if {Ex + h)i > Si{x) = 0 if k < 
{Ex + h)i < Ui , and 5<(x) = -1 if {Ex + h)i < U. With the storage of ^(x^ 1 ) 
and S{x k ), we can easily check whether or not W{x k ) = W(x fc_1 ), since 
W{x k ) = W{x k ~ 1 ) if and only if ^(x^ -1 ) = S{x k ). Also we can use a linear 
time algorithm to find the step size tk (cf. [4, 5, 13, 24]). Therefore, the most 
expensive operations in each iteration is the matrix- vector multiplications. 
Note that there are only four matrix-vector multiplications involved in each 
iteration: 

Ex k ,E{Ex k + h)?,Ep k ,Er k , 

since Ep k+1 = Er k + f3 k Ep k . Therefore, roughly speaking, the computational 
cost of each iteration is about four times of that of the classical conjugate 
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gradient method for solving the linear system Mx — 6 = 0. Also note that 
the reformulation does not change any sparse pattern of M . 

Finally, we want to say a few words about H(x k ) defined by (3.24). We 
can also define cr(x) in the following way: 



1, if k < (Ex + h)i < Ui 

0, if (Ex + h)i < U or (Ex + h)i > u;. 


(3.29) 


Then Theorem 17 still holds. However, the choices of a(x) are related to 
how one wants to treat the current expected active constraints. See [25] for 
details. 


3.4 Conclusions 

Based on a conjugate gradient method for unconstrained minimization of a 
strictly convex quadratic function, we derived a conjugate gradient method 
for strictly convex quadratic programs with simple bound constraints. We 
also established a global error estimate for iterates which implies the linear 
convergence of the iterates. In the case that the solution of the original 
quadratic program is nondegenerate, then the conjugate gradient method 
terminates in finite iterations. The conjugate gradient method can start at 
any point which may be infeasible. The computational cost for each iteration 
is 0(n 2 ) flops, proportional to the classical conjugate gradient method. The 
study indicates that the strictly convex quadratic program with simple bound 
constraints (3.6) is very similar to the linear system Mx = 6. Extensive 
n um erical experiments shall be done to compare the proposed method with 
other existing methods for solving (3.6). The purpose of this article is to show 
the importance of the reformulation of (3.6) as unconstrained minimization 
of a strictly convex quadratic spline and to establish a foundation for further 
study on applications of such an unconstrained reformulation. 

Note that the conjugate gradient methods discussed in [1, 2, 7, 10, 11, 
12, 22, 29] can be directly applied to an unconstrained minimization prob- 
lem with a strictly convex quadratic spline as the objective function. It is 
interesting to know what convergence result one may derive for this special 
case. 
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Chapter 4 

Differentiable Exact Penalty 
Functions via 

Hestenes-Powell-Rockafellar’s 
Augmented Lagrangian 
Function 


We study differentiable exact penalty functions, depending only on x, de- 
rived from Hestenes-Powell-Rockafellar’s quadratic augmented Lagrangian 
function for a minimization problem with two-sided inequality constraints 
by using Fletcher’s Lagrangian multiplier estimate. We also consider new 
penalty functions, depending only on the Lagrangian multiplier, derived from 
the augmented Lagrangian function. These penalty functions are particularly 
useful for quadratic programming problems. 


4.1 Introduction 

Consider the following constrained minimization problem: 

min /(x) subject to 1 < g(x) < u, (4-1) 

where / : R n — ► R and g : R n — * R m axe twice continuously differentiable 
functions. Here we assume l < u and some components of l and u may be 



— oo and oo, respectively. The corresponding augmented Lagrangian function 
L(x,y,a ) introduced independently by Hestenes [17, 18] and Powell [29] for 
equality constraints and by Rockafellar [30, 31] for inequality constraints can 
be written in the following unified way: 


L(x t y,a) :=/(*) + f (£($(*) - u) + y) + 


+ - 
' 2 


(i ( l - $(*)) - y) 


+ 


- f l|y|| 


(4.2) 


where y is the Lagrangian multiplier corresponding to two-sided inequality 
constraints, a is a penalty parameter, and (z)+ is the vector whose ith com- 
ponent is max{0,Zi}. The idea of using one Lagrangian multiplier for two- 
sided inequality constraints was first proposed by Bertsekas [3, 1,2]. Along 
with complementarity conditions, the Lagrangian multiplier y satisfies the 
following equation: 

/'(*) + 9'{x)y = 0 ) ' ( 4 - 3 ) 

where f'{x ) is the gradient of /(x) whose zth component is J^- and g'{x) is the 
Jacobian of g{x) whose ith column is the gradient of gi(x). £br equality con- 
strained minimization problems (/ = u — 0), the corresponding augmented 
Lagrangian function is reduced to the following form: 


(X 

L(x,y,a ) = f{x) + - 


(l , . \ 

2 

a 

/ i . . \ 

-g(x) + y 

+ 77 

( — g(x) - y ) 

\a J + 

2 

\ a / + 


2 


a 

2 


2 


L(x,y, a) = /(*) + y T g(x ) + ^|b(x)|| 2 , (4.4) 

since 


/ 1 \ 

2 

/ 1 , , \ 

2 

l 

l a’W + '>) + 

+ 

(--j(*)-y) + 

— 

-g{x) + y 
a 


For equality constraints, it is natural to assume that g'(x) has rank m for 
every x and Fletcher proposed to use the following estimate for the multiplier 

y(x) := -(g'{x) T g'(x))- 1 g'(x) T f(x). (4.5) 

Substituting y in (4.4) by y(x) given in (4.5), Fletcher obtained a penalty 
function L(x,y(x),a) depending only on x [10]. Glad and Polak [12] con- 
sidered that Fletcher’s idea was very good, except for two shortcomings. 
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“The first was that he did not know how to find automatically a satisfactory 
value of the penalty a , while the other was that his extension of his for- 
mula to problems with inequalities [11] results in discontinuous derivatives 
in the augmented Lagrangian, which caused algorithms to jam.” Therefore, 
for mixed equality and inequality constraints, say li = Ui = 0 for 1 < i < k 
and k = — oo, Uj = 0 for k + 1 < i < m, Glad and Polak proposed a new for- 
mula for the multiplier y. Glad and Polak’s multiplier function was designed 
to minimize the violation of Karush- Kuhn- Tucker conditions [12]: 

771 

min ||/'(x) + g'(x)y\\ 2 + ( 4 - 6 ) 

y i=fc+i 

where 7 is a nonzero scalar. Under the linear independence constraint qual- 
ification, (4.6) is a strictly convex quadratic program and its solution y*(x ) 
can be explicitly written as follows: 

»♦(*) = -(s'MVM + 7 2 g'MTV0'0 t /'M, (4-7) 

where G(x ) is the diagonal matrix whose ith diagonal entry is 0 for 1 < i < k 
and <7,(x) for k + 1 < i < m. 

Comparing Fletcher’s multiplier y(x) with Glad and Polak’s multiplier 
t/*(x), we realize that y(x) is actually equal to y*(x) with 7 = 0, when g'(x ) 
has rank m. The matrix 7 2 G(x) 2 is somehow a regularization factor and, as a 
consequence, y*(x) is differentiable under the linear independence constraint 
qualification while y(x) is only differentiable under a much stronger condi- 
tion that the gradients g[(x ), • ■ ■ ,g' m (x) are linearly independent. Therefore, 
Glad and Polak’s multiplier y*(x) can be applied to more general cases than 
Fletcher’s multiplier y(x). However, within the unified framework for equal- 
ity and inequality constraints, the restriction of Fletcher’s multiplier y(x) is 
due to its stronger requirement on the constraint qualification instead of the 
types of constraints. Even for two-sided inequality constraints l < g(x) < u, 
there is no reason not to use Fletcher’s multiplier y(x) if g'(x) has rank m. In 
fact, in this case, y(x) is much simpler than y*(x) and results in a much better 
penalty function L(x,y(x),a). For example, for linear inequality constraints 
such as simple bound constraints on the variable x and a quadratic objective 
function f(x ), y*(x) is a rational function of x while y(x) is a linear function 
of x. As a consequence, it becomes extremely difficult to find a penalty pa- 
rameter a that makes L(x,y*(x),a) an exact penalty function. In general, 


61 



one can only define the penalty parameter a to get an exact penalty function 
L(x, j/*(x), a) with respect to a compact set [7]. Lucidi and Grippo had to 
incorporate barrier terms into the augmented Lagrangian function to get a 
global exact penalty function for quadratic programs with simple bound con- 
straints [13, 14]. On the contrary, for quadratic programs with simple bound 
constraints, the differentiable penalty function L(x , t/(x), or) is a differentiable 
piecewise quadratic function which is a strongly exact penalty function with 
respect to R n (a noncompact set) and the penalty parameter a can be eas- 
ily estimated by using the spectrum radius of the Hessian of the quadratic 
objective function [21] (cf. also Proposition 29). In fact, for quadratic pro- 
grams with linearly independent two-sided inequality constraints, Fletcher’s 
approach of constructing a penalty function depending only on x becomes an 
excellent idea without the two shortcomings pointed out by Glad and Polak 
(cf. Section 3). 

Moreover, as an extension of Fletcher’s idea, we propose to construct a 
penalty function L(x(y),y,a) depending only on the Lagrangian multiplier 
y if (4.3) always has a unique solution x(j/). This approach is suitable for a 
quadratic programming problem whose objective function has a nonsingular 
Hessian (cf. Section 4). 

It is interesting to note that L(x,y(x),a) or L(x(y),y,a ) can also be 
derived as a reformulation of the Karush-Kuhn- Tucker conditions of (4.1). 
This was the approach used for deriving the exact penalty functions ^(x) 
and $(y) for quadratic programs without knowing that they are actually 
L(x, t/(x), o:) and L(x(y),y,a) [24, 25] (cf. also Sections 3 and 4). 

The paper is organized as follows. General properties of the augmented 
Lagrangian function L(x,y,a) for two-sided constraints are given in Section 
2. The differentiable exact penalty functions L(x,y(x),a) and L(x(t/), y, a) 
are discussed in Sections 3 and 4, respectively. Final comments are included 
in Section 5. 

Now we give some terminologies and notations used in the paper. 

For simplicity, we use f'(x) to denote the gradient of /(x) (as a column 
vector) and use f"(x) to denote the Hessian of /(x). A real-valued function 
/(x) on R n is said to be a quadratic spline, if the gradient f'(x ) of /(x) is 
a piecewise linear mapping from R n to R n . That is, a quadratic spline is 
a continuously differentiable piecewise quadratic function. The 2-norm || • || 

on R n is defined as |[x|[ := (SJLi^f) 2 and the 2-norm of an n x n matrix 
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B is defined as |[B|[ := sup{||f?x|| : x € R n with ||x|| = 1}. The transpose 
of a matrix B or a vector x is denoted by B T or x T . For x, y € R n , x < y 
means X( < yi for 1 < i < n, where x, or yi denotes the i-th component 
of x or y. Let (z)j* be the lower and upper truncation of z by / and u, 
respectively, whose i-th component is max{li,min{ui, z<}}. By convention, 
z+ is a vector whose i-th component is max{2i,0}. A vector x* 6 R n is said 
to be a local solution of (4.1) if l < g(x*) < u and there exists a positive 
constant 8 such that f(x) > /(x*) whenever l < g(x) < u and ||x — x*|| < 8. 
A vector x* € R n is said to be a global solution of (4.1) if l < g{ x *) < u sJid 
f( x ) > f( x *) whenever l < g(x) < u. A mapping x = x(io) from R fe to R n 
is said to be an open mapping if x(-) maps open sets in R* to open sets in 
R n (i.e., {x(ui) : w € U} is open whenever U is an open subset of R fc ). It is 
not difficult to verify that x(-) is an open mapping if and only if, for any w* 
and 8 > 0, there exists a positive constant t such that 

{x : |[x — x(u)*)|[ < e} C (x(io) : |[to — iw*|[ < 5). 

A mapping x = x(u>) from R fc to R n is said to be onto if its range is R n (Le., 
for any x € R n , there exists w € R fc such that x(iw) = x). The following 
definition of exact penalty functions was commonly used in literature (cf. 
[15]) and was formally given by Di Pillo and Grippo [7]. 

Definition 18 Let F(x) be a function from R' 1 to R. Then F(x) is said to 
be an exact penalty function of (f.l) with respect to a subset T> of R n , if x* 
is a local (or global) solution of (4-1) whenever x* E T> is a local (or global) 
minimizer of F(x). 

4.2 Some Properties of the Augmented La- 
grangian Function 

Most properties of the augmented Lagrangian function L(x, y, ct) given in 
this section axe well-known for equality and one-sided inequality constraints 
(cf. [3]). Even though Bertsekas used one multiplier for two-sided inequality 
constraints in design of numerical methods for solving constrained minimiza- 
tion problems, he did not explicitly use the formula (4.2) as the augmented 
Lagrangian function for two-sided constraints [3, 1, 2]. Therefore, we give 
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complete proofs of these well-known properties of the augmented Lagrangian 
function here. 

First we want to reformulate explicitly the Kaxush-Kuhn- Tucker condi- 
tions of (4.1) as a system of nonlinear equations. Special cases of the follow- 
ing lemma were given by Mangasarian (Lemma 2.1 in [27]), Glad and Polak 
(Lemma 1 in [12]), and Li and Swetits (Lemma 2.2 in [25]). 

Lemma 19 For any a. > 0, (x,y) is a Karush- Kuhn- Tucker point of (4-1) 
if and only if 


f\x) + g'(x)y = 0 and g{x) = ( g(x ) + ay)“ . (4.8) 

Proof. For any given point (x,y), ( x,y ) is a Karush-Kuhn-Tucker point of 
(4.1) if and only if (x, y) satisfies the following conditions: f'{x)-\-g'(x)y = 0 
and 

y,(x) — h if j/t < 0, 

gi(x) = Ui if yi > 0, (4.9) 

h < gi{x) < Ui if yi = 0. 

It is not difficult to verify that, for any given a > 0, (4.9) is equivalent to 
the system of nonlinear equations: g[x) = ( g(x ) + ay)f. ■ 

The following property of the augmented Lagrangian function L(x,y,a) 
is well-known and is fundamental for deriving exact penalty functions from 
L(x,y,a). 

Lemma 20 For any a > 0, if l < g(x) < u and y € R”\ then 

L(x,y,a) < f(x). (4.10) 

The equality in (4-10) holds if (x,y) is a Karush-Kuhn-Tucker point of (4 ■ 1) ■ 
Proof. Let l < g(x) < u. If L(^(x) — uf) + yi > 0, then 

0 < -(&(x) - Ui) + yi<yi 

a 

and ^ ^ 

-(/,• - gi{x )) - yi = -{h - u^ - i-(gi(x) -u^ + yA < 0. 
a a \a / 
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(4.11) 


As a consequence, we have that 

(t( l i ~ #(*)) - y<) + • (i(ft( x ) “ u 0 + »<)+ = °» 

(t(M x ) " u ‘) + Vi) 2 + + (i(fc - #(*)) - y«)+ - y ‘ ? - 

Similarly, (4.11) holds if £( U - g%(x)) - y< > 0. Moreover, (4.11) trivially 
holds if both £(/< - g t (x)) - y< < 0 and £(#(*) - u<) + y< < 0. Therefore, it 
follows from the inequality in (4.11) that 

L(x,y,a) = f(x) + f E£i (s(ft(*) “ «0 + tt)* 

+f ££=i (i(fc - *(*)) - w)+ - f * /( x )- 

It follows from the equality in (4.11) and ^(z)+ = (a z ) + that 



&(»(*) -«) + y ) + | 8 + f | (ia-j(*))- 


(i(j(«) - «) + ») + - (tO ~ ~ v)+ 

||((p(*) - «) + <*v)+ - (0 - ff( x )) - a y)+| 


») 


2 

2 


+ 


2 


Since (z) ? = z + (f - z)+ - (z - u) + , for z = 5 ( 2 ) + ay, we obtain 


(4.12) 


(g(x) + ay)? = g(x) + ay + ((/ - y(x)) - ay) + - ((g(x) - u) + ay) + . (4.13) 


If (x,y) satisfies the Karush-Kuhn-Tucker conditions, by Lemma 19 and 
(4.13), we get 

ay = i(g( x ) - u ) + ay)+ - (0 ~ d( x )) - ay) + . (4-14) 


It follows from (4.12) and (4.14) that L(x,y,a) = /(x). ■ 


Lemma 21 Suppose that x — x(w) and y = y(w) are differentiable mappings 
of w and a > 0. Then 

L' w (x(w),y(w),a) = x'(w)(f'(x ) + y'(x)y) ^ 

+ ±(x'(w)g'( x ) + ay'(w )) (g(x) - (g(x) + ay(x))} 1 ) . 

Moreover , the following statements are true: 
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1. If (x(w),y(w)) is a Karush- Kuhn- Tucker point of (4-1), then 

L' w {x(w),y(w),a) = 0. 

2. Suppose that x(w) is an open and onto mapping. If (i(u>*), y{w*)) 
is a Karush- Kuhn- Tucker point of (4-1) and w* is a local (or global) 
minimizer of L(x(w),y(w),a), then x{w *) is a local (or global) solution 

of (4-V- ' 

Proof. By the chain rule, we have that 

^«(*( w 0>v( u 0» o 0 = x'(w)f(x) - ay'{w)y 

+a (hx'{w)g'{x) + y'(wj) fag(x) “ u ) + v) + (4.16) 

-« (i x W( x ) + y '( w )) (K* ~ ^( z )) _ y) + • 

By (z)“ = z + (l — z) + — (z — u) + and = ^(z)+, we obtain that, for 

z = g(x) + ay, 

(M*) - u ) + y) + - - y( x )) - v) + (4 17) 

= l ($(*) - (g(x) + ay)? + ay) . 

The formula (4.15) follows from (4.16) and (4.17). 

If (x(w),y(w)) is a Karush-Kuhn-Tucker point of (4.1), by Lemma 19, 
we have f'(x) + g'(x)y = 0 and g(x) = ( 5 ( 2 :) + a:t/)“. Thus, by (4.15), 
L' w (x{w),y{w),a) = 0. 

Now suppose (x(io*), y(w*)) is a Karush-Kuhn-Tucker point of (4.1). If 
w* is a local (or global) minimizer of L(x{w),y(w),a ), then there exists a 
positive constant 8 such that 

L(x(w),y(w),a) > L(x(w*),y(w*),a) for ||tu — io*|| < 8. (4.18) 

Since x{w) is an open mapping, there exists a positive constant e such that 

{x € R n : ||x — x(tt>*)|[ < e} C (x(to) : ||u> — io*|( < 6}. (4.19) 

Now, let x £ R n be such that ||x — x(u;*)|| < t and l < g(x) < u. By (4.19), 
there is w such that x(w) = x and |(t£> — u>*|| < 8. By Lemma 20 and (4.18), 
we get 

f(x) = f{x(w)) > L(x(w),y(w),a) > L(x(w*),y{w*),a) = f(x(xv*)). 
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IS 


Thus, x(u>*) is a local solution of (4.1). 

Finally assume that w* is actually a global minimizer of L(x(w), y(w), or). 
For any x with l < g(x) < u, there is u; such that x = x(tn), because x(w) is 
an onto mapping. Thus, 

/(x) = f(x(w)) > L(x(u>),p(u>),a) > T(x(uj*),t/(in*),o;) = f(x(w*)) 
and x(tu*) is a global solution of (4.1). ■ 

It is interesting to note that one necessary condition for a differentiable 
penalty function F(x) being exact is that the Karush-Kuhn-Tucker condi- 
tions of (4.1) can be derived from the n equations F'(x ) = 0. In general, this 
is possible only if the multiplier y is uniquely determined by x. Therefore, it 
becomes clear why the extended Mangasarian-Fromovitz constraint qualifica- 
tion is a standard condition for differentiable exact penalty functions. In gen- 
eral, one might not be able to derive the m equations g(x) = (g(x) + ay(x))f 
in the Karush-Kuhn-Tucker conditions from the n equations F'(x) = 0 if 
m> n. 


4.3 Exact Penalty Function in Primal Vari- 
ables 

In this section, we discuss the penalty functions L(x,y(x), a) derived from 
Hestenes-Powell-Rockafellar’s augmented Lagrangian function by using 
Fletcher’s multiplier y(x). We assume that g'(x) has rank m and 

y(x) = -{g'{x) T g'{x))~ l g\x) T /'(x). 

Note that, for w = x, x'(w) is the identity matrix. The following lemma 
is an immediate consequence of Lemma 21. 

Lemma 22 For any x 6 and y 6 R m , 

L' x (x,y(x),a) = f'(x) + g'(x)y(x) 

+ £ (g'(x) + aj /(*)) (g(x) - {g(x) + ay(x))}*) . 
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Lemma 23 If g'(x) T g'( x) + ag'(x) T y'(x) is nonsingular for x in a subset T> 
of R n , then L(x,y(x),a ) is an exact penalty function of (f.l) with respect to 
V. 


Proof. By the definition of y(x), we have 

9 '(x) T (f'(x) + g'{x)y{x)) = 0. 

It follows from (4.20) and (4.21) that 

\(g'{x) T g'{x) + ag'(x) T y'(x )) ( g(x ) - (g(x) + c«/(x))J*) 

= 9'( x ) T L' x {x,y{x),a). 


(4.21) 


(4.22) 


Suppose that x* € T> and x* is a local (or global) minimizer of L(x , y(x), a). 
Then L' x (x*,y(x*),a) = 0. Since x * € V, g'(x*) T g'(x*) + ag'(x*) T y'(x*) is 
nonsingular. By (4.22), we have 


9 ( x *) ~ (p(z*) + ay(®*))J* = 0. 


(4.23) 


From L' x (x*, y(x*), a) = 0, (4.23), and (4.20), we also have that f(x*) + 
g\x*)y(x*) = 0. By Lemma 19, (x*,y(x*)) is a Karush-Kuhn- Tucker point 
of (4.1). 

Note that, for w = x, x(to) = w is an open and onto mapping from 
R n to R n . Thus, if (x*,y(x*),o;) is a Karush-Kuhn- Tucker point of (4.1), 
by Lemma 21, x* is a local (or global) solution of (4.1). This proves that 
L(x, y(x), a) is an exact penalty function of (4.1) with respect to V. ■ 


Theorem 24 Suppose that T> is compact. Then there exists a positive con- 
stant a* such that , for 0 < a < a* , L(x,y(x),o;) is an exact penalty function 
of (4-1) with respect to V. 


Proof. Note that 

g'{x) T g‘(x) (j m + a (g'(xj V(x)) 1 $'(*) V(*)) 

= $'(*)V(*) + oig'{x) T y'(x ), 


(4.24) 


where I m is the identity matrix of order m. Thus, g'{x) T g'{x) + ag'(x) T y'(x) 
is nonsingular if and only if 


Im + a (g'(x) T g'(xj) 1 g'(x) T y'(x) 


(4.25) 


68 


is a nonsingular matrix. 

Since V is compact and g'(x),y'(x) are continuous mappings, we obtain 


7 := max 
' xev 


(g\x) T g\x)) 1 g'(x) T y'{x) < oo. 


Let a* = Then it is easy to verify that the matrix given in (4.25) is 
nonsingular for 0 < a < a*. Theorem 24 follows from Lemma 23. ■ 


Theorem 25 Suppose that g(x) = Ax, where A is an m x n matrix with 
rank m. If there exists a positive constant a such that 


0 < a 


( AA T )- 1 Af"(x)A T (AA r ) 


T\— 1 


< 1 


(4.26) 


for x in a subset D of R n , then L{x,y[x),oi) is an exact penalty function of 
(4-1) with respect to T>. 


Proof. Since g(x) = Ax, it is easy to see that 


y(x) = ~(AA t ) 1 Af(x) 


and 

y'(x) = -f\x)A T (AA T )~\ 

The matrix given in (4.25) can be rewritten as follows: 

I m ~ *{AA T )- l Af"{x)A T {AA T )~\ (4.27) 

If (4.26) holds for x E T>, then the matrix given in (4.27) is nonsingular 
for x € T>; and it follows from (4.24) that g'{x) T g'{x ) + ag'(x) T y'(x ) is 
nonsingular for x 6 V. Thus, by Lemma 23, L(x,y(x),a ) is an exact penalty 
function of (4.1) with respect to V. ■ 

An immediate consequence of Theorem 25 is the following result about 
an exact penalty function for quadratic programs with linearly independent 
constraints: 


min -x t Mx — b T x 

x 2 


subject to / < Ax < u, 


(4.28) 
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where A is an m x n matrix with rank m, M is an n x n symmetric matrix, and 
b £ R". In this special case, y{x ) = (AA 7 )* 1 A(M x — b ) and the augmented 
Lagrangian function has the following form: 


L(x, y, a) := \x T Mx - b T x + § (^(Ax -u) + yj 


+ - 
“ 2 


(i(*- Ax )-y) + -f 


(4.29) 


Theorem 26 Suppose that 0 < a| ( AA T ) 1 AMA T (AA T ) 1 1 < l, L(x,y,a) 
is given by (4-29), and y{x) := (AA T )~ 1 A(b — Mx). Then L(x,y(x),a) is an 
exact penalty function of (f.28) with respect to R". 


The above theorem is an attempt to extend the following unconstrained 
reformulation of a convex quadratic program with simple bound constraints. 


Proposition 27 Suppose that A is the identity matrix I n (m = n) and M 
is positive semidefinite. Let 0 < o||M|[ < 1 and 

Hx) := 1 -x T (E-E I )x-ab T Ex + ^(c-Ex) + f+ 1 -\\(Ex-d) + f, (4.30) 

where E I — aM, c := l — otb, and d := u — ab. Then ^(z) is a convex 
quadratic spline. Moreover , x* is a minimizer o/'J r (z) if and only if x* is a 
solution of the convex quadratic program (4-28). 


It is easy to verify that ^^(z) — §|[i>|[ 2 = T(z, j/(z),ar). Therefore, the 
above proposition shows a very important property of the penalty function 
T(z, j/(z), a): it preserves the convexity of the original quadratic program- 
ming problem. As an extension of the above proposition, we have the follow- 
ing equivalent unconstrained reformulation of a convex quadratic program 
with linearly independent constraints. 


Theorem 28 Suppose thatO < a (AA T ) 1 AMA T (AA T ) 1 


< l, M is pos- 


itive semidefinite, y(x) := (AA?)' 1 A(b — Mx), and L(x,y,a) is given by 
(4-29). Then L(x,y(x),a) is a convex quadratic spline function. Moreover, 
x* is a minimizer of L(x,y(x),a) if and only if x* is a solution of the convex 
quadratic program (4-28). 
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(4.31) 


Proof. Since y'(x) = —MA T (AA T ) 1 , it follows from Lemma 22 that 


L' x {x , y(x), a) = Px + p- -Q T (i Qx + ?)“ , 

a 

where p := A T (AA T )~ l Ab — b, q := a(AA T )~ 1 Ab , 

P := M + l -A T A - A T (AA T )~ 1 AM - MA T (AA T )- 1 A, 

Q := A- a(AA T )~ l AM. 

We claim that P and P — L Q T Q are positive semidefinite. 

In fact, since a > 0, it suffices to prove that P — \Q T Q is positive 
semidefinite. By simple algebraic manipulations, we obtain 



P - -Q t Q = M - aMA(AA T )-\AA T )- 1 A T M. 
a. 

Since M is positive semidefinite, there exists annxn matrix B such that 
M = B T B. Thus, 

P - -Q t Q = B T (I n - aBA(AA T )-\AA T )~ 1 A T B T )B. 

ot 


Therefore, P — ^Q T Q is positive semidefinite, if the symmetric matrix 


I n -aBA(AA T )- 1 (AA T )- 1 A T B T (4.33) 


is positive semidefinite. For any matrix D , it is easy to verify that D T D 
and DD T have the same set of nonzero eigenvalues by using singular value 
decomposition. Since the 2-norm of a symmetric positive semidefinite matrix 
is its largest eigenvalue, we have ||£> T Z)|| = ||Z)Z) T ||. Let D := BA(AA T )~ 1 . 
Then 

\\BA{AA T )-\AA T )-'A T B T \\ 

= \\{AA T )- 1 A T B T BA(AA T )- 1 \\ 

= \\(AA T )~ l A t M A(AA t )~ 1 \\. 


Since 0 < a 
matrix 


\\(AA t )- 1 AMA t (AA t )~ 1 


< 1, all eigenvalues of the symmetric 


*BA{AA T )-\AA T )~ l A T B T 
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are less than 1. Therefore, all eigenvalues of the symmetric matrix given in 
(4.33) axe positive. As a consequence, the matrix given in (4.33) is positive 
definite. This completes the proof that P — L Q T Q is positive semidefinite. 

From the proof of Lemma 2.1 in [25] we know that L x (x, t/(x), a) is a 
monotone mapping. As a consequence, L(x,y(x),a) is a convex function 
[28). 

Since M is positive semidefinite, x * is a solution of (4.28) if and only if 
(. x*,y(x *)) is a Karush- Kuhn- Tucker point of (4.28). Similarly, x* is a min- 
imizer of L(x, y(x), a) if and only if L x (x*,y(x*),a) = 0, since L(x,t/(x),a) 
is convex. By Lemma 21 (1) and Theorem 26, (x*,t/(x*)) is a Karush-Kuhn- 
Tucker point of (4.28) if and only if L' x (x, y(x),a) = 0. Therefore, x* is a 
solution of (4.28) if and only if x* is a minimizer of L(x, y(x), a). ■ 

The above theorem was proved first by Li and Swetits [25] when m = n 
and A is a nonsingular matrix (cf. Proposition 27). Note that, if A is 
a nonsingular matrix, then (4.28) can be reformulated as a quadratic pro- 
gram with simple bound constraints by using the substitution x = Ax. For 
quadratic programs with simple bound constraints, we actually have the fol- 
lowing stronger result [21]. 

Proposition 29 Suppose that A is the identity matrix I n of order n and 
L(x,y,a ) is given by (4.29). Let 0 < 2o;||M|[ < 1. Then x* is a local 
solution (or an isolated local solution, or a global solution) of (4-28) if and 
only if x* is a local minimizer ( or an isolated local minimizer, or a global 
minimizer) of the quadratic spline L(x,b — Mx,a). 

Based on Theorem 28 and Proposition 29, we axe intrigued by the possi- 
bility of proving the following conjecture. 

Conjecture 30 Consider the following unconstrained minimization of a quadratic 
spline: 

min L (x, (AA T )~ 1 A(b — Mx),a) , (4-34) 

where L(x,y,a) is given by (4-29). There exists a positive constant a such 
that (4-28) and (4-34) have the same set of local solutions, the same set of 
isolated local solutions, and the same set of global solutions. 
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Remark. As we mentioned before, it is easy to reformulate (4.28) as a 
quadratic program with simple bound constraints if A is nonsingulax (ra = n) 
by using the substitution x = Ax. Therefore, by Theorem 28 and Proposition 
29, the unknown part of the above conjecture is when m < n and M is not 
positive semidefinite. 


4.4 Exact Penalty Function in Dual Vari- 
ables 

As we pointed out in Section 2, if the Lagrangian multiplier y can not be 
uniquely determined by x , then it is unlikely to find a differentiable exact 
penalty function depending only on i. In this section, we discuss the pos- 
sibility of deriving an exact penalty function in the dual variable y from 
the augmented Lagrangian function L(x,y,a). We assume that g(x) = Ax, 
where A is any m x n matrix, and the Jacobian f"{x ) of f(x) is always 
nonsingulax. Then (4.3) has a unique solution x(y) for any given y and 

Ay) = -Af"(x)~\ (4.35) 

For w = y, y'(w ) is the identity matrix I m . For any given y and g'(x ) = A T , 
by (4.15) and (4.35), 

Ly(x(y),y,<x) = x'(y)(f'(x) + g'{x)y) 

y)g'(x) + aim) - (g(x) + «y)“) (4.36) 

= \ (atl m - Af"{x)~ l A T ) ( g(x ) - (g(x) + ay)?) , 

since (x(y),y) satisfies (4.3). By Lemma 19, (x(y),y) is a Karush-Kuhn- 
Tucker point of (4.1) if and only if 

g(*(v)) - ( 4 - 37 ) 

When the matrix 

al m - Af"{x)-'A T (4.38) 

is nonsingular, by (4.36), (x(y),j/) is a Karush-Kuhn- Tucker point of (4.1) 
if and only if L' y (x(y),y,a) = 0. In the following two theorems, we state 
conditions on a which ensure that the matrix given in (4.38) is nonsingular. 
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Theorem 31 For any compact set T> of R m , let 

a* := max \\Af"(x(y))~ 1 A T \\ < oo. 

yEV 

Then , for a > a* and y € T>, L' y {x(y),y,a) = 0 if and only if ( x(y),y ) is a 
Karush-Kuhn-Tucker point of (4- 1). 

Theorem 32 Suppose that there exists a such that a > ||.A T / w (x) _1 A|[ for 
x in a subset V of R n . Then, for y G V, L' y (x(y),y,a) = 0 if and only if 
(x(y),y) is a Karush-Kuhn-Tucker point of (4-1 ) • 

Now consider the following quadratic programming problem: 

min -x T Mx — b T x subject to 1 < Ax < u, (4.39) 

x 2 

where A is any m x n matrix, M is an n x n nonsingular symmetric matrix, 
and b € R n . The corresponding augmented Lagrangian function L(x,y,a) 
has the form given in (4.29). 

Theorem 33 Let x(y) = A T y ) and L(x,y,a) be given by (4-29). 

Then, for a > ||A T M -1 y4||, L' y (x(y),y,a) = 0 if and only if (x(y),y) is a 
Karush-Kuhn-Tucker point of (4-1) • 

The above theorem is a special case of Theorem 32. Note that x(y) = 
M~ l {b — A T y) is an open and onto mapping when A has rank n. In this case, 
by Theorem 33 and Lemma 21, x(y*) is a local (or global) solution of (4.1), 
if y* is a local (or global) minimizer of L(x(y),y, a) with a > \\A T M~ l A\\. 
We can consider L(x(y),y,a) as an exact penalty function in dual variables. 

Corollary 34 Suppose that x(y) = M~ l (b— A T y ), L(x,y,a) is given by 
(4-29), and A has rank n. Then, for a > \\A T M~ l A\\, x(y*) is a local (or 
global) solution of (4-1) if y* is a local (or global) minimizer of L(x(y), y, a). 

The above result is an attempt to extend the following unconstrained 
reformulation of a strictly convex quadratic programming problem [24, 25]. 
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Proposition 35 Suppose that M is positive definite and a > 1 A||. 

Then x* := M -1 (6 - A T y*) is a solution of (4-39) and ( x*,y *) is a Karush- 
Tuhn-Tucker point of (4-39) if and only if y* is a minimizer of the following 
convex quadratic spline: 

m := %v T By - \\\(By - c) + f - |||(<£ - By) + || 2 , (4.40) 

where B = olI — AM~ l A? , c := ceAM~^b —l, and d := aAM l b — it. 

The penalty function $(y) was introduced based on a reformulation of 
the Karush-Kuhn-Tucker conditions for (4.39) [25]. In fact, one can easily 
verify that 

L(M -1 (6 - A T y ),y,a) = -$(y) - \b T M~ l b. (4-41) 

06 Z 

Based on Proposition 35 and Corollary 34, it is reasonable to believe that 
(4.39) is equivalent to the unconstrained minimization of the quadratic spline 
L(M~ l (b- A T y),y,a). 

Conjecture 36 Consider the following unconstrained minimization prob- 
lem: 

min L (M~ x ( b — A T y), y, a;) , (4-42) 

where L(x,y, a) is the corresponding augmented Lagrangian function of (4-39) 
given in (4-39). Then there exists a positive constant a such that y* is a local 
solution (or global solution) of (4-43) if and only if x* := M -1 ( 6 - A T y *) is 
a local solution (or global solution) of (4-39) and (re*,?/*) is a Karush-Kuhn- 
Tucker point of (4-39). 

Remark. For the dual unconstrained reformulation, there might be many 
dual solutions y* corresponding to one primal solution x*. Therefore, it 
is impossible to establish correspondence between isolated local solutions. 
The conjecture is true if M is a positive definite symmetric matrix [25] (cf. 
Proposition 35). 
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4.5 


Comments 


With a differentiable exact penalty function, one can use various uncon- 
strained minimization techniques to solve the original constrained minimiza- 
tion problem. See [3, 4, 5, 6, 7, 8, 9, 13, 14, 16, 19, 20, 23, 24, 25, 26] for 
some recent works as well as references on the subject. 

It is interesting to note that, when (x(u>), y(u>)) satisfies (4.3), one way 
to reformulate the Karush-Kuhn- Tucker conditions (4.8) is to find a function 
F(w) such that 

F'(w) = B{w)(g{x(w )) - (^(as(txi)) + ai/(u>))“), (4.43) 

where B{w) is a nonsingular matrix. Then (x(tu), y(u>)) is a Karush-Kuhn- 
Tucker point of (4.1) if and only if F'(w) = 0. If the original constrained 
minimization problem is convex, then it is equivalent to the unconstrained 
minimization of the penalty function F(w). The exact penalty functions $(x) 
and $(y) for convex quadratic programs with simple bound constraints and 
strictly convex quadratic programs, respectively, were constructed based on 
this approach [24, 25]. Later, Li proved that 'P(x) is a strongly exact penalty 
function for any quadratic program with simple bound constraints [21]. Note 
that, by Lemma 21, F(w) := L(x(w),y(w),a) satisfies (4.32) if (x(to), y(to)) 
is a solution of (4.3). The differentiable exact penalty functions ^(x) and 
4>(y) can actually be derived from the augmented Lagrangian function by 
eliminating either x or y from the equation (4.3) (cf. Sections 3 and 4). 

However, when the differentiable convex quadratic piecewise penalty func- 
tions either in the primal variable x or in the dual variable y were derived 
based on the reformulation (4.43) of the Karush-Kuhn- Tucker conditions, the 
special structures of the involved quadratic program seem to be crucial for 
the construction of these penalty functions. The author was asked, by P. 
Tseng, J.-S. Pang, and N. Gould in different circumstances, whether or not 
these penalty functions are related to differentiable penalty functions given by 
Di Pillo and Grippo or augmented Lagrangian functions. Our study reveals 
that the penalty functions derived for convex quadratic programs can ac- 
tually be derived from Hestenes-Powell-Rockafellar’s augmented Lagrangian 
function for two-sided constraints by eliminating either x or y by using the 
linear equation in the Karush-Kuhn- Tucker conditions. Furthermore, our 
results show that Fletcher’s multiplier function y(x) (if works) produces an 
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exact penalty function for a quadratic programming problem on R n and the 
penalty parameter can be easily determined by the 2-norm of the matrix 
(AA t )- 1 AMA t (AA t )~ 1 . 

The penalty function L(x(y),y, a) seems to be extremely useful when one 
has to solve a sequence of separable strictly convex quadratic programming 
problems which are closely related to one another: 

min — x T b k subject to l k < Ax < u k , (4.44) 

where A has rank m. Let Fk{y) ■= Lk{b k — A T y,y,a) be the corresponding 
differentiable piecewise quadratic exact penalty function of (4.44) as given in 
Theorem 33. In this special case, for a > ||A r A||, Fk(y) is actually a strictly 
convex quadratic spline (cf. [25]). In order to find the unique minimizer y* +1 
of Fk(y), we can use y k as the initial guess. If b k , l k , u k are small perturbations 
of ft* -1 , I* -1 , u* -1 , respectively, then y k+l should be close to y k and it is easy 
to find y k+1 by starting from y k . A data smoothing technique for piecewise 
convex/concave curves was developed based on the efficiency of a Newton 
method for finding the unique minimizer of Fk(y) starting from y k (cf. [22]). 
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